I am using an adapted LeNet model in keras to make a binary classification. I have about 250,000 training samples with ratio 60/40. My model is training very well. The first epoch the accuracy reaches 97 percent with a loss of 0.07. After 10 epochs the accuracy is over 99 percent with a loss of 0.01. I am using a CheckPointer to save my models when they improve.
Around the 11th epoch the accuracy drops to around 55 percent with a loss of around 6. How, could this be possible? Is it because the model cannot be more accurate and it tries to find better weights but completely fails to do so?
My model is an adaptation on the LeNet model:
lenet_model = models.Sequential()
lenet_model.add(Convolution2D(filters=filt_size, kernel_size=(kern_size, kern_size), padding='valid',\
input_shape=input_shape))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Convolution2D(filters=64, kernel_size=(kern_size, kern_size), padding='valid'))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Convolution2D(filters=128, kernel_size=(kern_size, kern_size), padding='valid'))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Flatten())
lenet_model.add(Dense(1024, kernel_initializer='uniform'))
lenet_model.add(Activation('relu'))
lenet_model.add(Dense(512, kernel_initializer='uniform'))
lenet_model.add(Activation('relu'))
lenet_model.add(Dropout(0.2))
lenet_model.add(Dense(n_classes, kernel_initializer='uniform'))
lenet_model.add(Activation('softmax'))
lenet_model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
The problem lied in applying a
binary_crossentropy
loss whereas in this casecategorical_crossentropy
should be applied. Another approach is to leavebinary_crossentropy
loss but to change output to havedim=1
and activation tosigmoid
. The weird behaviour comes from the fact that withbinary_crossentropy
a multiclass binary classification (with two classes) is actually solved whereas your task is a single class binary classification.