优化过程中很难卡住(keras stuck during optimization)

2019-09-29 18:53发布

试穿CIFAR10的Keras例子之后,我决定去更大的东西:在一个VGG片状网微小Imagenet数据集。 这与200类(而不是1000)和100K图像数据集ImageNet降尺度到64x64的子集。

我从文件vgg_like_convnet.py的VGG般的模型这里 。 不幸的是,事情会非常喜欢这里 ,只是这次改变学习率或交换为TH TF于事无补。 既不改变优化器(见下面的代码)。

精度基本上停留在0.005,正如有人指出,是你所期待的与200类完全随机的答案。 更糟的是,如果通过权重的init侥幸,它开始于,比如说0.007,它会迅速收敛到0.005,坚决那里停留任何后续历元。

的Keras代码(TH版)是如下所示:

from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.regularizers import l2, activity_l2, l1, activity_l1
from keras.optimizers import SGD, Adam, Adagrad, Adadelta
from keras.utils import np_utils
import numpy as np
import cPickle as pickle

# seed = 7
# np.random.seed(seed)

batch_size = 64
nb_classes = 200
nb_epoch = 30

# input image dimensions
img_rows, img_cols = 64, 64
# the tiny image net images are RGB
img_channels = 3

# Load the train dataset for TH
print('Load training data')
X_train=pickle.load(open('xtrain_shu_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_shu_th.p','rb')) # np.zeros((100000,1)).astype('uint8')

# Load the test dataset for TH
print('Load validation data')
X_test=pickle.load(open('xtest_th.p','rb')) # np.zeros((10000,3,64,64)).astype('uint8')
y_test=pickle.load(open('ytest_th.p','rb')) # np.zeros((10000,1)).astype('uint8')

# the data, shuffled and split between train and test sets
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(ZeroPadding2D((1,1),input_shape=(3,64,64)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu',))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_6'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_8'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_11'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_13'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_15'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_18'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_20'].values()))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))#,weights=pretrained_weights['layer_22'].values()))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(200, activation='softmax'))

# let's train the model using SGD + momentum (how original).

opt = SGD(lr=0.0001, decay=1e-6, momentum=0.7, nesterov=True)
# opt= Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
# opt = Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)
# opt = Adagrad(lr=0.01, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

print('Optimization....')
model.fit(X_train, Y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, Y_test),
          shuffle=True)

# Save the resulting model
model.save('model.h5')

这种微小的Imagenet数据集由我与转换到djpeg PPM JPEG图像。 然后,我创建含有大量的二进制文件,对于每个图像,类别标签(1个字节),接着加入(64x64x3字节)。

阅读从Keras这个文件是速度奇慢。 所以(我很新的Python的,它听起来愚蠢到你),我决定初始化一个4D numpy的阵列(100000,3,64,64)(用于TH,(100000,64,64,3)为TF )与数据集和咸菜了。 现在只需〜40多岁的,当我运行上面的代码来加载该数据集在数组中。

我甚至检查了酸洗阵列包含在与下面的代码的正确顺序的数据:

import numpy as np
import cPickle as pickle

print("Reading data")
pix=pickle.load(open('xtrain_th.p','rb'))
print("Done")

img=67857

f=open('img'+str(img)+'.ppm','wb')
f.write('P6\n64 64\n255\n')

for y in range(0,64):
    for x in range(0,64):
        f.write(chr(pix[img][0][y][x]))
        f.write(chr(pix[img][1][y][x]))
        f.write(chr(pix[img][2][y][x]))
f.close()

这提取PPM图像从数据集回来。

最后,我注意到,训练数据集太排序(即第500张图片都属于0类,第二个500到1级,等等,等等)

所以,我拖着他们用下面的代码:

# Dataset preparation for Theano backend
import cPickle as pickle
import numpy as np
import random as rnd

n=100000

print('Load training data')
X_train=pickle.load(open('xtrain_th.p','rb')) # np.zeros((100000,3,64,64)).astype('uint8')
y_train=pickle.load(open('ytrain_th.p','rb')) # np.zeros((100000,1)).astype('uint8')

tmpa=np.zeros((3,64,64)).astype('uint8')

# Shuffle the data
print('Shuffling training data')
for _ in range(0,n):
    i=rnd.randrange(n)
    j=rnd.randrange(n)
    tmpa=X_train[i]
    X_train[i]=X_train[j];
    X_train[j]=tmpa
    tmp=y_train[i][0]
    y_train[i][0]=y_train[j][0]
    y_train[j][0]=tmp

print 'Pickle dump'
pickle.dump(X_train,open('xtrain_shu_th.p','wb'))
pickle.dump(y_train,open('ytrain_shu_th.p','wb'))

没有什么帮助。 我是不是在第一次尝试预计99%的准确率,但至少有一些动作,然后高原。

我想尝试TFLearn,但它有一个挂起的错误,当我前几天看了。

有任何想法吗 ? 提前致谢

Answer 1:

您可以在keras模型API(洗牌使用构建https://keras.io/models/model/#fit )。 就在洗牌参数设置为true。 你可以做两批洗牌和全球洗牌。 默认是全球性的洗牌。

但有一两件事要注意的是,洗牌发生之前在合适的验证分裂完成。 因此,如果你想洗你的验证数据太我会建议你使用: sklearn.utils.shuffle 。 ( http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html )

从GitHub:

if shuffle == 'batch':
    index_array = batch_shuffle(index_array, batch_size)              
elif shuffle:
    random.shuffle(index_array)


文章来源: keras stuck during optimization