Keras accuracy not increasing over 50% on binary C

2019-08-27 13:00发布

I am using keras to process the following subset of my data:

5000 images of class A
5000 images of class B

With 1000 of these images for each class being used as validation. Scaling the images to 96x96x3 channels and normalised to be within the range 0-1. I am using the following model:

model.add(Conv2D(32, (3, 3), activation="relu", input_shape=inputshape))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

And then training the model in the following way:

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])

However the accuracy rarely (just by chance) increases over 50% accuracy:

Epoch 1/100
8000/8000 [==============================] - 23s 3ms/step - loss: 0.6939 - acc: 0.5011 - val_loss: 0.6932 - val_acc: 0.5060
Epoch 2/100
8000/8000 [==============================] - 22s 3ms/step - loss: 0.6938 - acc: 0.4941 - val_loss: 0.6941 - val_acc: 0.4940
Epoch 3/100
8000/8000 [==============================] - 22s 3ms/step - loss: 0.6937     - acc: 0.4981 - val_loss: 0.6932 - val_acc: 0.4915
Epoch 4/100
8000/8000 [==============================] - 22s 3ms/step - loss: 0.6933 - acc: 0.5056 - val_loss: 0.6931 - val_acc: 0.5060
Epoch 5/100
8000/8000 [==============================] - 22s 3ms/step - loss: 0.6935 - acc: 0.4970 - val_loss: 0.6932 - val_acc: 0.4940

I don't think the problem is the data itself, as I have used an alternative machine learning method and got over 94% accuracy with the exact same images (except using just 5 training images for each class, but that's beside the point).

Any help would be greatly appreciated.

Oh! In case it matters: I'm using the CNTK backend.

Edit: Here is the code I use to read in the images, which also normalises the pixel values to the 0-1 range:

import cv2
import numpy as np
from keras.preprocessing.image import img_to_array

healthy_files = sorted(os.listdir("../../uninfected/"))
healthy_imgs = [cv2.imread("../../uninfected/" + x) for x in healthy_files]
data = []
labels = []
for img in healthy_imgs[:5000]:
    resized = cv2.resize(img, (96, 96)).astype(numpy.float32) / 255.0 # normalise data to 0..1 range
    arr = img_to_array(resized) 
    data += [arr]
    labels += [0]
# The for loop above is then repeated over the other half of the dataset, with the labels line using the label [1] instead
data = np.array(data, numpy.float32)

Edit 2: Here is the output of model.summary():

Model built:
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 94, 94, 32)        896
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 92, 92, 32)        9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 46, 46, 32)        0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 44, 44, 64)        18496
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 42, 42, 64)        36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 21, 21, 64)        0
_________________________________________________________________
flatten_1 (Flatten)          (None, 28224)             0
_________________________________________________________________
dense_1 (Dense)              (None, 256)               7225600
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 257
=================================================================
Total params: 7,291,425
Trainable params: 7,291,425
Non-trainable params: 0

I noticed that there were no activation layers explicitly listed in this summary so I changed the model to this:

model.add(Conv2D(32, (3, 3), input_shape=inputshape))
model.add(Activation("relu"))
model.add(Conv2D(32, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
#model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
#model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
#model.add(Dropout(0.5))
#model.add(Dense(10, activation="relu"))
model.add(Dense(1))
model.add(Activation("sigmoid"))

Which gave a summary output of this:

Model built:
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 94, 94, 32)        896
_________________________________________________________________
activation_1 (Activation)    (None, 94, 94, 32)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 92, 92, 32)        9248
_________________________________________________________________
activation_2 (Activation)    (None, 92, 92, 32)        0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 46, 46, 32)        0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 44, 44, 64)        18496
_________________________________________________________________
activation_3 (Activation)    (None, 44, 44, 64)        0
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 42, 42, 64)        36928
_________________________________________________________________
activation_4 (Activation)    (None, 42, 42, 64)        0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 21, 21, 64)        0
_________________________________________________________________
flatten_1 (Flatten)          (None, 28224)             0
_________________________________________________________________
dense_1 (Dense)              (None, 64)                1806400
_________________________________________________________________
activation_5 (Activation)    (None, 64)                0
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65
_________________________________________________________________
activation_6 (Activation)    (None, 1)                 0
=================================================================
Total params: 1,872,033
Trainable params: 1,872,033
Non-trainable params: 0

Needless to say, the results remain the same...

2条回答
对你真心纯属浪费
2楼-- · 2019-08-27 14:00

So, after trying all the suggestions the wonderful people had in the comments, I had no luck. I decided to go back to the drawing board, or in this case, try it on an alternative computer. My original code worked!

In the end I narrowed it down to the backend - I was using CNTK on the first computer, and Tensorflow on the second. I tried CNTK on the second computer, and it worked perfectly... So I decided to reinstall CNTK on the first computer. This time, the code worked perfectly. So I have no idea what was broken initially, but it had something to do with my install of CNTK. I guess in the end, this whole Q&A doesn't really help anyone.. but if anyone experiences a similar issue - try the suggestions in the comments on the question - some really good advice there. And if that doesn't work... try changing your backend!

Cheers

查看更多
等我变得足够好
3楼-- · 2019-08-27 14:03

also it is usually a bad idea to use dropout in convolutional layer, use batch normalization instead.

查看更多
登录 后发表回答