I want to know what the tensorflow function sparse_softmax_cross_entropy_with_logits mathematically is exactly doing. But I can't find the origin of the coding. Can you help me?
问题:
回答1:
sparse_softmax_cross_entropy_with_logits is equivalent to a numerically stable version of the following:
-1. * tf.gather(tf.log(tf.nn.softmax(logits)), target)
or, in more "readable" numpy-code:
-1. * np.log(softmax(logits))[target]
where softmax(x) = np.exp(x)/np.sum(np.exp(x))
.
That is, it computes the softmax of the provided logits, takes the log thereof to retrieve the log-probabilities, and slices the log-probabilities to retrieve the log-probability of the target.
However, it does so in a numerically stable way (a couple of things can go wrong here) by adding small values to some of the operations. This means that computing the above - verbose - version will only approximately result in the same values as nn.sparse_softmax_cross_entropy_with_logits
(running some tests showed that the difference is consistently smaller than 2e-6).
回答2:
The most important part of the implementation is here starting at line 132.
This functor is called by the kernel implementation.
It uses a not-very-well-documented feature of Eigen called generators that allow writing fairly flexible code and have it compile both for CPU and by nvcc for GPU.
回答3:
In the head version (as of today), you can find the function in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_ops.py line #424.
The comment says:
Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.