Entropy is a measure of the amount of uncertainty or randomness in a set of data.
Specifically, entropy is a function that measures the average amount of information required to represent a random variable.
How to calculate
In information theory, the entropy of a discrete random variable X with probability mass function p(X) can be calculated using the following equation:
\begin{equation} H(X) = -\sum_{x \in \mathcal{X}} p(x) \log_2 p(x) \end{equation}
where $\mathcal{X}$ is the set of possible outcomes of X, $p(x)$ is the probability of each outcome, and $\log_2$ is the base-2 logarithm.
-
The term inside the summation, $p(x) \log_2 p(x)$, is called the information content of each outcome x, and the negative sign is included to ensure that the entropy is non-negative.
-
The interpretation of entropy in machine learning is that it measures the impurity or disorder of a set of labels.
- For example, in a binary classification problem with labels 0 and 1, if all the data points have label 0, the entropy is 0, indicating perfect purity.
Conversely, if the labels are evenly split between 0 and 1, the entropy is 1, indicating maximum impurity.