When I run model.fit using both batch_size and steps_per_epoch parameters I receive the following error:
ValueError: If steps_per_epoch is set, the `batch_size` must be None.
So, from this error and from the following piece of documentation from keras Model(functional API)
batch_size: Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32.
steps_per_epoch: Integer or None. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined.
I understand that both parameters are somehow equivalent. But, in my laptop, (with a GeForce 940M graphics card with 2GB of VRAM and training the cifar10 dataset) when I run model.fit with epochs parameter set to 256 the script runs fine and the feedback gave from keras is like this:
4608/50000 [=>............................] - ETA: 1:59 - loss: 0.8167 - acc: 0.7398
updating the first number always adding 256 units. However, when passing steps_per_epoch as the number_train//batch_size I run out of memory and cannot run my script unless I pass batch_size as 1.
So, how model.fit work with those parameters? What is the difference when I use just one of those instead the another one?
That's a good question. What I observe from the source code ([1] and [2]) is that:
batch_size
, the training data is sliced into batches of this size (see L184).steps_per_epoch
, if the training inputs are not framework-native tensors (this is the most common case), the whole training set is being fed into the network in a single batch (see L152), and that's why you get the memory error.Therefore, based on the implementation, I would advise to use the argument
steps_per_epoch
only when feeding through framework-native tensors (i.e. TensorFlow tensors with the first dimension being the batch size), and that is indeed a requirement. In order to do this, the argumentsx
andy
frommodel.fit
need to be set toNone
.