I am working on a deep learning model using Google's TensorFlow. The model should be used to segment and label scenes.
- I am using the SiftFlow dataset which has 33 semantic classes and images with 256x256 pixels.
- As a result, at my final layer using convolution and deconvolution I arrive at the following tensor(array) [256, 256, 33].
- Next I would like to apply Softmax and compare the results to a semantic label of size [256, 256].
Questions: Should I apply mean averaging or argmax to my final layer so its shape becomes [256,256,1] and then loop through each pixel and classify as if I were classying 256x256 instances? If the answer is yes, how, if not, what other options?
To apply softmax and use a cross entropy loss, you have to keep intact the final output of your network of size batch_size x 256 x 256 x 33. Therefore you cannot use mean averaging or argmax because it would destroy the output probabilities of your network.
You have to loop through all the batch_size x 256 x 256 pixels and apply a cross entropy loss to your prediction for this pixel. This is easy with the built-in function
tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
.Some warnings from the doc before applying the code below:
The trick is to use
batch_size * 256 * 256
as the batch size required by the function. We will reshapelogits
andlabels
to this format. Here is the code I use:You can then apply your optimizer on that loss.
Update: v0.10
The documentation of
tf.sparse_softmax_cross_entropy_with_logits
shows that it now accepts any shape forlogits
, so there is no need to reshape the tensors (thanks @chillinger):