What Breakdown point is
The breakdown point is a measure of robustness in statistics. It is a measure of how much the estimator can be affected by the inclusion of one or more outlier data points. In other words, it is the maximum amount of contamination or corruption that an estimator can withstand before its performance starts to degrade.
For a breakdown point to be meaningful, it must satisfy certain criteria. It must be a measure of how much an estimator can be affected by outliers and must be independent of the size of the dataset.
Steps for calculating the breakdown point of an estimator:
-
Select a dataset of size n.
-
Calculate the value of the estimator for this dataset.
-
Introduce a single outlier data point and recalculate the estimator.
-
Calculate the difference between the estimator values for the original dataset and the dataset with the outlier point.
-
Repeat steps 3 and 4 for different outlier data points until the estimator value is no longer reliable.
-
Calculate the breakdown point by dividing the number of outliers required to cause a breakdown in the estimator by the total number of data points in the dataset.
Examples
-
Breakdown point in statistics can be seen when a small proportion of observations exerts a large influence on the overall result. For example, in a sample of 100 observations, the presence of just one or two outliers can cause the mean or median of the sample to be significantly different than it would be if those outliers were removed.
-
Breakdown point can also be seen in a situation where a statistic is not robust, meaning that the result changes drastically when a small proportion of the observations are removed or altered. For example, if the median is used to measure the central tendency of a dataset, the result can be drastically different depending on whether the highest or lowest observation is included or excluded.