How can I train dlib shape predictor using a very

2019-09-02 08:06发布

问题:

I'm trying to use the python dlib.train_shape_predictor function to train using a very large set of images (~50,000).

I've created an xml file containing the necessary data, but it seems like train_shape_predictor loads all the referenced images into RAM before it starts training. This leads to the process getting terminated because it uses over 100gb of RAM. Even trimming down the data set uses over 20gb (machine only has 16gb physical memory).

Is there some way to get train_shape_predictor to load images on demand, instead of all at once?

I'm using python 3.7.2 and dlib 19.16.0 installed via pip on macOS.

回答1:

I posted this as an issue on the dlib github and got this response from the author:

It's not reasonable to change the code to cycle back and forth between disk and ram like that. It will make training very slow. You should instead buy more RAM, or use smaller images.

As designed, large training sets need tons of RAM.