Problem: I am training a model for multilabel image recognition. My images are therefore associated with multiple y labels. This is conflicting with the convenient keras method "flow_from_directory" of the ImageDataGenerator, where each image is supposed to be in the folder of the corresponding label (https://keras.io/preprocessing/image/).
Workaround: Currently, I am reading all images into a numpy array and use the "flow" function from there. But this results in heavy memory loads and a slow read-in process.
Question: Is there a way to use the "flow_from_directory" method and to supply manually the (multiple) class labels?
Update: I ended up extending the DirectoryIterator class for the multilabel case. You can now set the attribute "class_mode" to the value "multilabel" and provide a dictionary "multlabel_classes" which maps filenames to their labels. Code: https://github.com/tholor/keras/commit/29ceafca3c4792cb480829c5768510e4bdb489c5
You could simply use the
flow_from_directory
and extend it to a multiclass in a following manner:Where
multiclasses_getter
is assigning a multiclass vector / your multiclass representation to your images. Note thatx
andy
are not a single examples but batches of examples, so this should be included in yourmulticlasses_getter
design.You could write a custom generator class that would read the files in from the directory and apply the labeling. That custom generator could also take in an ImageDataGenerator instance which would produce the batches using flow().
I am imagining something like this:
Then simply:
*Disclaimer: I haven't actually tested this, but it should work in theory.