Keras ImageDataGenerator setting mean and std

2020-06-23 07:18发布

问题:

I have a pretrained keras model and I want to use it on new dataset. I have weights, mean and std file from the pretrained model and I want to use flow_from_directory from Image Data Generator to load the new dataset. The problem is how can I set the mean and std file for normalization explicitly?

Thanks

回答1:

I think the best way to achieve this is writing your own method to process the samples generated by flow_from_directory. It could be something like:

def custom_normilze_generator(directory, mean):
    for img in flow_from_directory(directory):
        yield (img - mean)


回答2:

I think you can just use the 'featurewise_center' and 'featurewise_std_normalization' of ImageDataGenerator to handle that. Reference: https://keras.io/preprocessing/image/#imagedatagenerator-class

Say, the mean [R, G, B] value of your pre-trained dataset is [123.68, 116.779, 103.939], and the std is 64.0. You can then use the example code below: (using Keras 2 with TF backend, image_data_format='channels_last')

from keras.preprocessing import image

datagen = image.ImageDataGenerator(featurewise_center=True,
                                   featurewise_std_normalization=True)
datagen.mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((1,1,3)) # ordering: [R, G, B]
datagen.std = 64.
batches = datagen.flow_from_directory(DATASET_PATH + '/train',
                                      target_size=(224,224),
                                      color_mode='rgb',
                                      class_mode='categorical',
                                      shuffle=True,
                                      batch_size=BATCH_SIZE)


回答3:

I ended up using a custom functor to calculate, store and retrieve the statistics:

class Normaliser():
    def __init__(self, mean=None, std=None):
        self.mean = mean
        self.std = std

    def __call__(self, img):
        if self.mean is not None:
            img = self.center(img)
        if self.std is not None:
            img = self.scale(img)
        return img

    def center(self, img):
        return img - self.mean

    def scale(self, img):
        return img / self.std

    def normalise(self, img):
        return (img - self.mean) / self.std

    def set_mean(self, filepaths, input_size):
        print("Calculating mean with shape: ", input_size)
        self.mean = # calculate the mean
        print("Mean is:", self.mean)
        return True

    def set_std(self, filepaths):
        self.std = # calculate the std
        return False

To use it, given an input size of input_size, pass an instance of the Normaliser to the ImageDataGenerator.

normaliser = Normaliser()
img_gen = ImageDataGenerator(preprocessing_function=normaliser)                         

Get the filepaths using of the keras preprocessing utilities. I used flow_from_directory

iterator = img_gen.flow_from_directory(directory, input_size=input_size)

Then setting the mean of the normaliser with

normaliser.set_mean(iterator.filepaths, input_size)

You can then save your statistics the way you want. In the prediction phase, you recreate the Normaliser with the saved mean using the constructor.

Compared to using the keras generator directly it helped me have more flexibility on how to treat my data, and more clarity on what modification I am doing to it, which is not a trivial thing. samplwise_center and featurewise_center are not the only ways to standardise images as this discussion describes