In Keras documentation - steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.
I have 3000 samples. If i set steps_per_epoch=3000 it's work slowly. If i set steps_per_epoch=300 it's work faster and i thought that Batch works!
But then I compared how much video memory is allocated in the first and second cases. And did not notice a big difference. If I use a simple fit() function then the difference is large. So it's real speed up or i just process 300 examples, instead of 3000?
What for this parameter is necessary? And how can I speed up the training? My generator code:
def samples_generator(self, path_source, path_mask):
while 1:
file_paths_x = self.get_files(path_source)
file_paths_y = self.get_files(path_mask)
for path_x, path_y in zip(file_paths_x, file_paths_y):
x = self.load_pixels(path_x, 3, cv2.INTER_CUBIC)
y = self.load_pixels(path_y, 0, cv2.INTER_NEAREST)
yield (x, y)
The
steps_per_epoch
parameter is the number of batches of samples it will take to complete one full epoch. This is dependent on your batch size. The batch size is set where you initialize your training data. For example, if you're doing this withImageDataGenerator.flow()
orImageDataGenerator.flow_from_directory()
, the batch size is specified with thebatch_size
parameter in each of these.You said you have 3000 samples.
steps_per_epoch
would be 30.steps_per_epoch
would be 300.steps_per_epoch
would be 3000.This is because
steps_per_epoch
should be equivalent to the total number of samples divided by the batch size. The process of implementing this in Keras is available in the two videos below.The reason why you have to set
steps_per_epoch
is that the generator is designed to run indefinitely (See the docs:). You implemented this by setting
while 1
. Sincefit_generator()
is supposed to runepochs=x
times, the method must know when the next epoch begins within this indefinitely loop (and, hence, the data has to be drawn from the beginning again).