There is an example of VGG16 fine-tuning on keras blog, but I can't reproduce it.
More precisely, here is code used to init VGG16 without top layer and to freeze all blocks except the topmost:
WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
weights_path = get_file('vgg16_weights.h5', WEIGHTS_PATH_NO_TOP)
model = Sequential()
model.add(InputLayer(input_shape=(150, 150, 3)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2'))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3'))
model.add(MaxPooling2D((2, 2), strides=(2, 2), name='block5_maxpool'))
model.load_weights(weights_path)
for layer in model.layers:
layer.trainable = False
for layer in model.layers[-4:]:
layer.trainable = True
print("Layer '%s' is trainable" % layer.name)
Next, creating a top model with single hidden layer:
top_model = Sequential()
top_model.add(Flatten(input_shape=model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(1, activation='sigmoid'))
top_model.load_weights('top_model.h5')
Note that it was previously trained on bottleneck features like it is described in the blog post. Next, add this top model to the base model and compile:
model.add(top_model)
model.compile(loss='binary_crossentropy',
optimizer=SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
And eventually, fit on cats/dogs data:
batch_size = 16
train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_gen = train_datagen.flow_from_directory(
TRAIN_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode='binary')
valid_gen = test_datagen.flow_from_directory(
VALID_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode='binary')
model.fit_generator(
train_gen,
steps_per_epoch=nb_train_samples // batch_size,
epochs=nb_epoch,
validation_data=valid_gen,
validation_steps=nb_valid_samples // batch_size)
But here is an error I am getting when trying to fit:
ValueError: Error when checking model target: expected block5_maxpool to have 4 > dimensions, but got array with shape (16, 1)
Therefore, it seems that something is wrong with the last pooling layer in base model. Or probably I've done something wrong trying to connect base model with the top one.
Does anybody have similar issue? Or maybe there is a better way to build such "concatenated" models? I am using keras==2.0.0
with theano
backend.
Note: I was using examples from gist and
applications.VGG16
utility, but has issues trying to concatenate models, I am not too familiar withkeras
functional API. So this solution I provide here is the most "successful" one, i.e. it fails only on fitting stage.
Update #1
Ok, here is a small explanation about what I am trying to do. First of all, I am generating bottleneck features from VGG16 as follows:
def save_bottleneck_features():
datagen = ImageDataGenerator(rescale=1./255)
model = applications.VGG16(include_top=False, weights='imagenet')
generator = datagen.flow_from_directory(
TRAIN_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode=None,
shuffle=False)
print("Predicting train samples..")
bottleneck_features_train = model.predict_generator(generator, nb_train_samples)
np.save(open('bottleneck_features_train.npy', 'w'), bottleneck_features_train)
generator = datagen.flow_from_directory(
VALID_DIR,
target_size=(150, 150),
batch_size=batch_size,
class_mode=None,
shuffle=False)
print("Predicting valid samples..")
bottleneck_features_valid = model.predict_generator(generator, nb_valid_samples)
np.save(open('bottleneck_features_valid.npy', 'w'), bottleneck_features_valid)
Then, I create a top model and train it on these features as follows:
def train_top_model():
train_data = np.load(open('bottleneck_features_train.npy'))
train_labels = np.array([0]*(nb_train_samples / 2) +
[1]*(nb_train_samples / 2))
valid_data = np.load(open('bottleneck_features_valid.npy'))
valid_labels = np.array([0]*(nb_valid_samples / 2) +
[1]*(nb_valid_samples / 2))
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels,
nb_epoch=nb_epoch,
batch_size=batch_size,
validation_data=(valid_data, valid_labels),
verbose=1)
model.save_weights('top_model.h5')
So basically, there are two trained models, base_model
with ImageNet weights and top_model
with weights generated from bottleneck features. And I wonder how to concatenate them? Is it possible or I am doing something wrong? Because as I can see, the response from @thomas-pinetz supposes that the top model is not trained separately and right away appended to the model. Not sure if I am clear, here is a quote from the blog:
In order to perform fine-tuning, all layers should start with properly trained weights: for instance you should not slap a randomly initialized fully-connected network on top of a pre-trained convolutional base. This is because the large gradient updates triggered by the randomly initialized weights would wreck the learned weights in the convolutional base. In our case this is why we first train the top-level classifier, and only then start fine-tuning convolutional weights alongside it.