I recently came across tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits.
Is the only difference that training vectors y
have to be one-hot encoded when using sparse_softmax_cross_entropy_with_logits
?
Reading the API, I was unable to find any other difference compared to softmax_cross_entropy_with_logits
. But why do we need the extra function then?
Shouldn't softmax_cross_entropy_with_logits
produce the same results as sparse_softmax_cross_entropy_with_logits
, if it is supplied with one-hot encoded training data/vectors?
Having two different functions is a convenience, as they produce the same result.
The difference is simple:
sparse_softmax_cross_entropy_with_logits
, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range[0, num_classes-1]
.softmax_cross_entropy_with_logits
, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.Labels used in
softmax_cross_entropy_with_logits
are the one hot version of labels used insparse_softmax_cross_entropy_with_logits
.Another tiny difference is that with
sparse_softmax_cross_entropy_with_logits
, you can give -1 as a label to have loss0
on this label.Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding.
You can verify this by running the following program:
Here I create a random
logits
vector of lengthdims
and generate one-hot encoded labels (where element inpos
is 1 and others are 0).After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output
I would just like to add 2 things to accepted answer that you can also find in TF documentation.
First:
Second: