Learn Statistics

Entropy is a measure of the amount of uncertainty or randomness in a set of data.

Specifically, entropy is a function that measures the average amount of information required to represent a random variable.

How to calculate

In information theory, the entropy of a discrete random variable X with probability mass function p(X) can be calculated using the following equation:

where is the set of possible outcomes of X, is the probability of each outcome, and is the base-2 logarithm.

The term inside the summation, , is called the information content of each outcome x, and the negative sign is included to ensure that the entropy is non-negative.
The interpretation of entropy in machine learning is that it measures the impurity or disorder of a set of labels.
- For example, in a binary classification problem with labels 0 and 1, if all the data points have label 0, the entropy is 0, indicating perfect purity.

Conversely, if the labels are evenly split between 0 and 1, the entropy is 1, indicating maximum impurity.

Entropy

How to calculate

Related Topics