What ECDF is
Empirical Cumulative Distribution Function (ECDF) is a nonparametric statistical technique used to estimate the cumulative distribution function of a random variable. It is a step-wise function that plots the proportion of the data that is less than or equal to a certain value. The ECDF is a useful tool for understanding the distribution of a dataset and can be used to compare two or more datasets.
Steps to calculate ECDF
- Sort the data in ascending order
- Calculate the proportion of data that is less than or equal to a certain value
- Plot this proportion against the value
- If there are multiple values of the same value, the ECDF will take the maximum value of the data
- Repeat steps 2-4 for each value in the dataset
- Connect the points on the graph to get the ECDF
Examples
-
Ecdf can be used to compare the cumulative distribution functions of two or more different datasets.
-
Ecdf can be used to determine if a dataset is skewed or symmetric.
-
Ecdf can be used to identify outliers in a dataset.
-
Ecdf can be used to assess the goodness-of-fit of a probability distribution to a dataset.
-
Ecdf can be used to compare the distributions of two or more variables in a dataset.