I am trying to implement a U-Net with Keras with Tensorflow backend for an image segmentation task. I have images of size (128,96) as input to the network together with mask images of size (12288,6) since they are flattened. I have 6 different classes (0-5) which gives the second part of the mask images' shape. They have been encoded to one-hot labels using the to_categorical() function. At the moment I use just one input image and also use the same one as validation and test data.
I would like the U-Net to perform image segmentation, where class 0 corresponds to the background. When I now train my U-Net only for a few epochs (1-10), the resulting predicted mask image seems to just give random classes to each pixel. When I train the network longer (50+ epochs), all pixels are classified as background. Since I train and test using the same image, I find this very weird as I was expedting the network to overtrain. How can I fix this problem? Could there be something wrong with the way I give mask images and the real images to the network?
I have tried giving weights to the network manually to put less emphasis on background than the other classes and have tried different combinations of losses, different ways of shaping the mask image and many other things but nothing gave good results.
Below is the code of my network. It is based on the U-Net taken from this repository. I managed to train it for the two class case with good results but I don't know how to extend it to more classes now.
def get_unet(self):
inputs = Input((128, 96,1))
#Input shape=(?,128,96,1)
conv1 = Conv2D(64, (3,3), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal', input_shape=(None,128,96,6))(inputs)
#Conv1 shape=(?,128,96,64)
conv1 = Conv2D(64, (3,3), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(conv1)
#Conv1 shape=(?,128,96,64)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
#pool1 shape=(?,64,48,64)
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(pool1)
#Conv2 shape=(?,64,48,128)
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(conv2)
#Conv2 shape=(?,64,48,128)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
#Pool2 shape=(?,32,24,128)
conv5 = Conv2D(256, (3,3), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(pool2)
conv5 = Conv2D(256, (3,3), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(conv5)
up8 = Conv2D(128, 2, activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv5))
merge8 = concatenate([conv2,up8], axis = 3)
conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(conv8)
up9 = Conv2D(64, (2,2), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = concatenate([conv1,up9], axis = 3)
conv9 = Conv2D(64, (3,3), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(64, (3,3), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(conv9)
conv9 = Conv2D(6, (3,3), activation = 'relu', padding = 'same',
kernel_initializer = 'he_normal')(conv9)
conv10 = Conv2D(6, (1,1), activation = 'sigmoid')(conv9)
conv10 = Reshape((128*96,6))(conv10)
model = Model(input = inputs, output = conv10)
model.compile(optimizer = Adam(lr = 1e-5), loss = 'binary_crossentropy',
metrics = ['accuracy'])
return model
Can anyone point out what is wrong with my model?
Thank you @Daniel, your suggestions helped me in the end to get the Unet to work. I managed to get results that did not just classify the whole image as background when running 500+ epochs. Also, instead of using
kernel_initializer='he_normal'
,kernel_initializer='zeros'
orkernel_initializer=TruncatedNormal(mean=0.0, stddev=0.07)
worked for me. I used 'sigmoid' activation function andloss='binary_crossentropy'
. I kept the 'relu' activation for all the hidden convolutional layers. I noticed that my network will sometimes be stuck in a local minimum where the loss does not improve anymore, so I need to restart.In my experience, also with a U-net for segmentation. It tends to do this:
I also use the "train just one image" method to find that convergence, then adding the other images is ok.
But I had to try a lot of times, and the only time it worked pretty fast was when I used:
But I wasn't using "relu" anywhere...perhaps that influences a little the convergence speed...? Thinking about "relu", which has only 0 or positive results, there is a big region in this function that does not have a gradient. Maybe having lots of "relu" activations creates a lot of "flat" areas without gradients? (Must think better about it to confirm)
Try a few times (and have patience to wait for many many epochs) with different weight initializations.
There is a chance that your learning rate is too big too.
About
to_categorical()
: have you tried to plot/print your masks? Do they really seem like what you expect them to?I don't see your prediction layer which as far as I know must be a dense layer and not a convolutional layer. Maybe that's your problem.