I have a network that produces a 4D output tensor where the value at each position in spatial dimensions (~pixel) is to be interpreted as the class probabilities for that position. In other words, the output is (num_batches, height, width, num_classes)
. I have labels of the same size where the real class is coded as one-hot. I would like to calculate the categorical-crossentropy
loss using this.
Problem #1: The K.softmax
function expects a 2D
tensor (num_batches, num_classes)
Problem #2: I'm not sure how the losses from each position should be combined. Is it correct to reshape
the tensor to (num_batches * height * width, num_classes)
and then calling K.categorical_crossentropy
on that? Or rather, call K.categorical_crossentropy(num_batches, num_classes)
height*width times and average the results?
Found this issue to confirm my intuition.
In short : the softmax will take 2D or 3D inputs. If they are 3D keras will assume a shape like this (samples, timedimension, numclasses) and apply the softmax on the last one. For some weird reasons, it doesnt do that for 4D tensors.
Solution : reshape your output to a sequence of pixels
Then apply your softmax
And then either you reshape your target tensors to 2D or you just reshape that last layer into (width, height, num_classes).
Otherwise, something I would try if i wasn't on my phone right now is to use a timedistributed(Activation('softmax')). But no idea if that would work... will try later
I hope this helps :-)
It seems that now you can simply do
softmax
activation on the lastConv2D
layer and then specifycategorical_crossentropy
loss and train on the image without any reshaping tricks or any new loss function. I've tried overfitting with a dummy dataset and it works well. Try it ~ !You can also compile using
sparse_categorical_crossentropy
and then train with output of shape(samples, height, width)
where each pixel in the output corresponds to a class label:model.fit(tensor4d, tensor3d)
The idea is that
softmax
andcategorical_crossentropy
will be applied to the last axis (you can checkkeras.backend.softmax
andkeras.backend.categorical_crossentropy
doc).PS. I use
keras
fromtensorflow.keras
(tensorflow 2)Update: I have trained on my real dataset and it is working as well.
You could also not
reshape
anything and define bothsoftmax
andloss
on your own. Here issoftmax
which is applied to the last input dimension (like intf
backend):and here you have
loss
(there is no need to reshape anything):No further reshapes need.
Just flatten the output to a 2D tensor of size
(num_batches, height * width * num_classes)
. You can do this with theFlatten
layer. Ensure that youry
is flattened the same way (normally callingy = y.reshape((num_batches, height * width * num_classes))
is enough).For your second question, using categorical crossentropy over all
width*height
predictions is essentially the same as averaging the categorical crossentropy for eachwidth*height
predictions (by the definition of categorical crossentropy).