Just getting started with Conv Nets and trying out an image segmentation problem. I got my hands on 24 images and their masks for the dstl satellite image feature detection competition. (https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/data)
I thought I’d try to follow the tips here https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html but I’m stuck.
I downloaded the pre-trained weights for ZF_UNET_224, the 2nd place winners’ approach to this problem. My image masks contain 5 objects so I popped the final layer and instead of having this:
activation_45 (Activation) (None, 224, 224, 32) 0 batch_normalization_44[0][0]
spatial_dropout2d_2 (SpatialDro (None, 224, 224, 32) 0 activation_45[0][0]
conv2d_46 (Conv2D) (None, 224, 224, 1) 33 spatial_dropout2d_2[0][0]
I have this now:
activation_45 (Activation) (None, 224, 224, 32) 0 batch_normalization_44[0][0]
spatial_dropout2d_2 (SpatialDro (None, 224, 224, 32) 0 activation_45[0][0]
predictions (Conv2D) (None, 224, 224, 5) 10 conv2d_46[0][0]
I’m trying to follow the exact steps from the Keras tutorial but when I do
my_model.fit_generator( train_generator, steps_per_epoch= 4, epochs=10, validation_data=validation_generator )
I get an error message saying
Output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: [[[[1. 1. 1. ] [1. 1. 1. ] [1. 1. 1. ] … [1. 1. 1. ] [1. 1. 1. ] [1. 1. 1. ]]
I think what I want is probabilities for each of the pixels in my 224X224 image, so that I can use those to generate masks on the original image but I’m not sure how to go about getting that.
I have 24 8 band input images and their masks which label 5 objects. I want to train this U-Net on these images and place masks on some test images and evaluate them for IoU or weighted log loss. Any help?
Update:
I'm using the same generator as in the Keras tutorial:
batch_size = 4
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data train_generator =
train_datagen.flow_from_directory(
'data/train', # this is the target directory
target_size=(224, 224), # all images will be resized
batch_size=batch_size,
color_mode='rgb',
class_mode=None) # since we use binary_crossentropy loss, we need binary labels
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
'data/valid',
target_size=(224, 224),
batch_size=batch_size,
color_mode = 'rgb',
class_mode=None)
One more thing: my training images have 8 bands but the architecture only accepts 3 bands. I think the generator only leaves 1 band at the end. Not sure how to solve this problem either.
On your error message:
With
flow_from_directory()
, yourImageDataGenerator
infers the class labels from the structure of the directory that contains your images. As in the example, the images should be arranged in subfolder per class.For your image segmentation problem, the label structure is more complex that just one label per image. The labels are masks with a label per pixel. In general, you want to provide these labels as
np arrays
to the model during training.You will not be able to handle your case with
flow_from_directory()
. One solution would be to write your own custom generator that reads both the images and labels from disk and use that withfit_generator()
.Suppose that you have a .csv file with two columns, one column with image names and one column with the path to the corresponding masks:
Then your generator could look something like this (I'm using
pandas
to read in the .csv file):I'm making use of the Keras
Sequence
object here to write the generator, because this allows for safe multiprocessing which will speed up training. See the docs on this subject.On your actual question about transfer learning:
You will not be able to use an architecture that was pretrained for 3-channel images on 8-channel images just like that. If you want to use the architecture, you could subsample channels, or perform a dimension reduction from 8 to 3 channels. See also this thread.