I have a pretrained keras model and I want to use it on new dataset. I have weights, mean and std file from the pretrained model and I want to use flow_from_directory from Image Data Generator to load the new dataset. The problem is how can I set the mean and std file for normalization explicitly?
Thanks
I think the best way to achieve this is writing your own method to process the samples generated by flow_from_directory
. It could be something like:
def custom_normilze_generator(directory, mean):
for img in flow_from_directory(directory):
yield (img - mean)
I think you can just use the 'featurewise_center' and 'featurewise_std_normalization' of ImageDataGenerator to handle that. Reference: https://keras.io/preprocessing/image/#imagedatagenerator-class
Say, the mean [R, G, B] value of your pre-trained dataset is [123.68, 116.779, 103.939], and the std is 64.0. You can then use the example code below: (using Keras 2 with TF backend, image_data_format='channels_last')
from keras.preprocessing import image
datagen = image.ImageDataGenerator(featurewise_center=True,
featurewise_std_normalization=True)
datagen.mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((1,1,3)) # ordering: [R, G, B]
datagen.std = 64.
batches = datagen.flow_from_directory(DATASET_PATH + '/train',
target_size=(224,224),
color_mode='rgb',
class_mode='categorical',
shuffle=True,
batch_size=BATCH_SIZE)
I ended up using a custom functor to calculate, store and retrieve the statistics:
class Normaliser():
def __init__(self, mean=None, std=None):
self.mean = mean
self.std = std
def __call__(self, img):
if self.mean is not None:
img = self.center(img)
if self.std is not None:
img = self.scale(img)
return img
def center(self, img):
return img - self.mean
def scale(self, img):
return img / self.std
def normalise(self, img):
return (img - self.mean) / self.std
def set_mean(self, filepaths, input_size):
print("Calculating mean with shape: ", input_size)
self.mean = # calculate the mean
print("Mean is:", self.mean)
return True
def set_std(self, filepaths):
self.std = # calculate the std
return False
To use it, given an input size of input_size
, pass an instance of the Normaliser to the ImageDataGenerator
.
normaliser = Normaliser()
img_gen = ImageDataGenerator(preprocessing_function=normaliser)
Get the filepaths using of the keras preprocessing utilities. I used flow_from_directory
iterator = img_gen.flow_from_directory(directory, input_size=input_size)
Then setting the mean of the normaliser
with
normaliser.set_mean(iterator.filepaths, input_size)
You can then save your statistics the way you want.
In the prediction phase, you recreate the Normaliser
with the saved mean using the constructor.
Compared to using the keras generator directly it helped me have more flexibility on how to treat my data, and more clarity on what modification I am doing to it, which is not a trivial thing.
samplwise_center
and featurewise_center
are not the only ways to standardise images as this discussion describes