For semantic segmentations, you generally end up with the last layer being something like
output = Conv2D(num_classes, (1, 1), activation='softmax')
My question is, how do I prepare the labels for this? For example, if I have 10 classes to identify, each with a different colour. For each label image, do I need to apply masking for one particular colour, turn this into grayscale image so that I can compare with 1 filter from the model output? Or is there a way to pass one full RGB picture in as the label?
The output of your network will be an image with 10 channels, where each pixel will consist of a vector of probabilities that sum to one (due to the softmax). Example: [0.1,0.1,0.1,0.05,0.05,0.1,0.1,0.1,0.1,0.2]. You want your labels images to be in the same shape: an image with 10 channels, and each pixel is a binary vector with a 1 at the index of the class and 0 elsewhere. Your segmentation loss function is then the pixel-wise crossentropy.
For implementation: the softmax in keras has an axis parameter: https://keras.io/activations/#softmax
np_utils.to_categorical(labels, num_classes)
When labels are (row,col), the output shape will be: (row, col, num_classes)
example:
https://github.com/naomifridman/Unet_Brain_tumor_segmentation