Keras: Fit Image augmentations to training data us

I want to use Image augmentation in Keras. My current code looks like this:

# define image augmentations
train_datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
zca_whitening=True)

# generate image batches from directory
train_datagen.flow_from_directory(train_dir)

When I run a model with this, I get the following error:

"ImageDataGenerator specifies `featurewise_std_normalization`, but it hasn't been fit on any training data."

But I didn't find clear information about how to use train_dataget.fit() together with flow_from_directory.

Thank you for your help. Mario

标签： machine-learning keras deep-learning

1条回答

成全新的幸福

2楼-- · 2019-04-08 10:33

You are right, the docs are not very enlightening on this ...

What you need is actually a 4-step process:

Define your data augmentation
Fit the augmentation
Setup your generator using flow_from_directory()
Train your model with fit_generator()

Here is the necessary code for a hypothetical image classification case:

# define data augmentation configuration
train_datagen = ImageDataGenerator(featurewise_center=True,
                                   featurewise_std_normalization=True,
                                   zca_whitening=True)

# fit the data augmentation
train_datagen.fit(x_train)

# setup generator
train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_height, img_width),
        batch_size=batch_size,
        class_mode='categorical')

# train model
model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples,
    epochs=epochs,
    validation_data=validation_generator, # optional - if used needs to be defined
    validation_steps=nb_validation_samples)

Clearly, there are several parameters to be defined (train_data_dir, nb_train_samples etc), but hopefully you get the idea.

If you need to also use a validation_generator, as in my example, this should be defined the same way as your train_generator.

UPDATE (after comment)

Step 2 needs some discussion; here, x_train are the actual data which, ideally, should fit into the main memory. Also (documentation), this step is

Only required if featurewise_center or featurewise_std_normalization or zca_whitening.

However, there are many real-world cases where the requirement that all the training data fit into memory is clearly unrealistic. How you center/normalize/white data in such cases is a (huge) sub-field in itself, and arguably the main reason for the existence of big data processing frameworks such as Spark.

So, what to do in practice here? Well, the next logical action in such a case is to sample your data; indeed, this is exactly what the community advises - here is Keras creator Francois Chollet on Working with large datasets like Imagenet:

datagen.fit(X_sample) # let's say X_sample is a small-ish but statistically representative sample of your data

And another quote from an ongoing open discussion about extending ImageDataGenerator (emphasis added):

fit is required for feature-wise standardization and ZCA , and it only takes an array as parameter, there is no fit for directory. For now, we need to manually read a subset of the image to do this fit for a directory. One idea is we can change fit() to accept the generator itself(flow_from_directory), of course, standardization should be disabled during fit.

Hope this helps...

0人赞添加讨论(0) 举报

Keras: Fit Image augmentations to training data us

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间