I am trying to compute the derivative of the activation function for softmax. I found this : https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function nobody seems to give the proper derivation for how we would get the answers for i=j and i!= j. Could someone please explain this! I am confused with the derivatives when a summation is involved as in the denominator for the softmax activation function.
相关问题
- neural network does not learn (loss stays the same
- Convolutional Neural Network seems to be randomly
- How to convert Onnx model (.onnx) to Tensorflow (.
- How to find the distance between a point and a par
- XOR Java Neural Network
相关文章
- how to flatten input in `nn.Sequential` in Pytorch
- Looping through training data in Neural Networks B
- Why does this Keras model require over 6GB of memo
- How to measure overfitting when train and validati
- Create image of Neural Network structure
- Is there a vectorized way to calculate the gradien
- Is there a vectorized way to calculate the gradien
- Neural Network – Predicting Values of Multiple Var
For what it's worth, here is my derivation based on SirGuy answer: (Feel free to point errors if you find any).
The derivative of a sum is the sum of the derivatives, ie:
To derive the derivatives of
p_j
with respect too_i
we start with:I decided to use
d_i
for the derivative with respect too_i
to make this easier to read. Using the product rule we get:Looking at the first term, the derivative will be
0
ifi != j
, this can be represented with a delta function which I will call D_ij. This gives (for the first term):Which is just our original function multiplied by
D_ij
For the second term, when we derive each element of the sum individually, the only non-zero term will be when
i = k
, this gives us (not forgetting the power rule because the sum is in the denominator)Putting the two together we get the surprisingly simple formula:
If you really want we can split it into
i = j
andi != j
cases:Which is our answer.