In the diagram, we use an image classification deep learning model which classifies an image into three classes. We pass the input image to the model, which consists of numerous layers; the output layer of the model produces the logit values.
The logits are passed on to the softmax activation function, which maps each logit to the probability that the image belongs to a certain class. After that, the exponent value of the individual logits is divided by the sum of all the exponent values to find the probabilities.
The class with the highest probability would be the final prediction of the model. Hence, the image would belong to the second class with a probability score 0.66.
Note:
The number of logits and probabilities would be equal to the total number of the different classes that we want to predict.
Code example
To understand the mathematical internal working for the softmax activation function, we can take a look at the C++ code below where the
input
vector represents the logits from the output layer that are to be passed to the
softmax()
function.
The
softmax()
function returns a vector that we store in the variable
output
that contains the probabilities calculated by the function against the logits.