Softmax is a function that is commonly used in the final layer of a neural network to convert a vector of real-valued numbers into a probability distribution.
It is a generalization of the logistic sigmoid function that is used for binary classification problems.
How to calculate
The softmax function takes as input a vector of real numbers, z = [z1, z2, …, zk], and returns a vector of the same size, p = [p1, p2, …, pk],
where each element pi represents the probability of the input belonging to the ith class.
The softmax function is defined as:
$$ p_i = \frac{e^{z_i}}{\sum_{j=1}^{k} e^{z_j}} $$
- In this equation, $p_i$ the predicted probability for the $i$th class, $z$ is the vector of input values, and $k$ is the total number of classes.
- The denominator of the equation is the sum of the exponential values of all input values, which ensures that the probabilities sum up to 1.