I have trained a model successfully over 100000 samples, which performs well both in train set and test set. Then, I tried to fine-tune it over one particular sample (one of the 100000 samples) and use the trained weights as the initialization.
But the result is a little strange and I believe it is caused by the batch normalization layer. Specifically, my code can be listed as follows:
model = mymodel()
model.load_weights('./pre_trained.h5') #start from history
rate = model.evaluate(x, y)
print(rate)
checkpoint = tf.keras.callbacks.ModelCheckpoint('./trained.h5', monitor='loss',
verbose=0, save_best_only=True, mode='min',save_weights_only=True)
model.fit(x, y,validation_data=[x, y], epochs=5, verbose=2, callbacks=[checkpoint])
model.load_weights('./trained.h5') rate = model.evaluate(x, y) print(rate)
mymodel is a self-define function to generate my model, consists of Dense and Batch normalization. x,y is the input and label of one particular sample. I want to further optimize the loss of the sample. However, the results is strange as:
1/1 [==============================] - 0s 209ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/200
- 1s - loss: -2.7749e-01 - val_loss: -6.0876e+00
Epoch 2/200
- 0s - loss: -2.8791e-01 - val_loss: -6.0876e+00
Epoch 3/200
- 0s - loss: -3.0012e-01 - val_loss: -6.0876e+00
Epoch 4/200
- 0s - loss: -3.1325e-01 - val_loss: -6.0876e+00
As it shown, first the model.evaluate
works well as the loss result ( -6.087581634521484) is close to the performance of loaded trained model. But the loss over the train set (actually same as the validation set in model.fit()
) is strange. The val_loss is normal, similar to the results of model.evaluate in the first line. So I'm really puzzled that why still a large difference between the train loss and the inference loss (the train loss is worse), as the train sample and the validation sample is the same one, I think the result should also be the same, or at least very close.I suspect the problem is caused by the BN layer, due to the large difference between train and inference. However, I have already set the trainable = False
of the BN layer after loading the pre-trained weights and before the model.fit
, but the problem is not solved.
out = tf.keras.layers.BatchNormalization(trainable=False)(out)
I still doubt the BN layer, and wonder if set trainable=False
is enough to keep the parameters of BN same.
Can anyone give me some advise? Thanks a lot for your help in advance. Sorry for my English, but I tried my best to explain my problem.
I had the similar finding in pytorch I would like to share. First of all, what is your keras version? Because after 2.1.3, set BN layer trainable=False will make BN behave exactly the same in inference mode, meaning that it will not normalize the input to 0 mean 1 variance(like in training mode), but to running mean and variance. If you set learning phase to 1, then BN essentially becomes instance norm, which ignores running mean and variance, just normalize to 0 mean and 1 variance, which might be your desired behavior.
Reference link of keras release note: https://github.com/keras-team/keras/releases/tag/2.1.3
A little awkward, I have found a strange way to solve the problem in another question Keras: Accuracy Drops While Finetuning Inception
Actually, I think it's not the enough answer, but when I add
before the
model.compile()
. The result became much normal, although still exists some problem:It is amazing and what I want, but I still puzzled about the problem. First, why it works, what does
tf.keras.backend.set_learning_phase(1)
do? In addition, I set thelayers.trainbale=True
, and why the BN layer works normally in this case? Then, why the loss and the val_loss still has a very small difference? As the sample is the same, what cause the phenomenon? Finally, I find that whether I usetf.keras.backend.set_learning_phase(0)
ortf.keras.backend.set_learning_phase(1)
, the result is similar and normal. Following is the result oftf.keras.backend.set_learning_phase(0)
:It is a little different from
tf.keras.backend.set_learning_phase(1)
, which also wait for a proper explanation.I'm new to deep learning and Keras, and I benefit a lot from Stack overflow. Both for my knowledge and my English.
Thanks for help in advance.
I found a possibile explanation here: https://github.com/keras-team/keras/pull/9965 and here: https://github.com/keras-team/keras/issues/9214