Finetuning VGG model with VGGFace weights

I am using a finetuned VGG16 model using the pretrained 'VGGFace' weights to work on Labelled Faces In the Wild (LFW dataset). The problem is that I get a very low accuracy, after training for an epoch (around 0.0037%), i.e., the model isn't learning at all.

I think it has got to do something with my architecture. My architecture is like this:

vgg_x = VGGFace(model = 'vgg16', weights = 'vggface', input_shape = (224,224,3), include_top = False)
last_layer = vgg_x.get_layer('pool5').output
x = Flatten(name='flatten')(last_layer)
x = Dense(4096, activation='relu', name='fc6')(x)

out = Dense(311, activation='softmax', name='fc8')(x)
custom_vgg_model = Model(vgg_x.input, out)

custom_vgg_model.compile(optimizer = keras.optimizers.Adam(), loss = 
keras.losses.categorical_crossentropy, metrics = ['accuracy'])

kfold = KFold(n_splits = 15,random_state = 42)
kf = kfold.get_n_splits(X_train)

for train_index,test_index in kfold.split(X_train):
    X_cross_train = X_train[train_index]
    X_cross_test = X_train[test_index]
    Y_cross_train = y_train[train_index]
    Y_cross_test = y_train[test_index]
    custom_vgg_model.fit(x = X_cross_train,y = Y_cross_train, batch_size = 32, epochs = 10,verbose = 2, validation_data = (X_cross_test,Y_cross_test))

I expect the model to learn atleast if not get a great accuracy. What could be the problem ? Is there something wrong with my architecture or anything else ?

Preprocessing step shouldn't be wrong, but just in case:

image_set_x = keras_vggface.utils.preprocess_input(image_set_x, version=1)

Try training with a smaller learning rate than the default one (for instance, 1e-4). The random weights from the classification layer can bring about large gradient updates. These will cause large weight updates in the lower layers and basically destroy the pretrained weights in the convolutional base.

In addition, you can use the ReduceLROnPlateau callback to further decrease the learning rate when validation accuracy stops increasing.

Another strategy to avoid large disruptive gradient updates is to freeze the weights in the convolutional base first, pre-train the classification layers, then finetune the entire stack with a small learning rate. This approach is explained in detail in the Keras blogpost on transfer learning: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html