Theoretically, a loss function should return 0 when target=prediction, and some positive number when target and prediction and different. In the simple example below, when the prediction=target, I am getting a loss value of -1, and when prediction and target are different, I am getting a loss value of 0. How are these values calculated? What is the intuition behind these values?
Screenshot from 2019-04-22 21-16-33.png
1820×318 29.4 KB
I also tried passing the log_softmax of input, but then also, I am getting a loss value of 2.1232. Again, intuitively, the value of loss in this case should be 0.
image.png
1132×519 30.3 KB
What is the intuition/reason behind these loss values?
F.log_softmax
expects logits which might have arbitrarily large and small numbers.
If you just call
F.softmax
on your
input
tensor, you’ll see that the probability is not exactly one for the desired class:
As you can see, class3 has a probability of 0.1197, so you’ll get a positive loss.
ptrblck:
F.log_softmax expects logits which might have arbitrarily large and small numbers.
Ahh, got it.
ptrblck:
As you can see, class3 has a probability of 0.1197 , so you’ll get a positive loss.
After applying F.nll_loss to this softmax output, I get a loss value of -0.1197 which is close to 0 but not exactly zero. Shouldn’t loss functions return exactly zero when prediction equals target like in this case?
You will get a loss close to zero (in this case you’ll see the print outputs an exact zero, although the value might just underflow), if the logit for the target class is very high:
Great, understood it! This also forces the network to output larger logit value for the corresponding neuron to reduce the loss during backprop!
Thanks for the explanation