Correct way to split data to batches for Keras sta

2020-06-20 13:30发布

问题:

As the documentation states

the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch

does it mean that to split data to batches I need to do it the following way e.g. let's assume that I am training a stateful RNN to predict the next integer in range(0, 5) given the previous one

# batch_size = 3
# 0, 1, 2 etc in x are samples (timesteps and features omitted for brevity of the example)
x = [0, 1, 2, 3, 4]
y = [1, 2, 3, 4, 5]

batches_x = [[0, 1, 2], [1, 2, 3], [2, 3, 4]]
batches_y = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

then the state after learning on x[0, 0] will be initial state for x[1, 0] and x[0, 1] for x[1, 1] (0 for 1 and 1 for 2 etc)?

Is it the right way to do it?

回答1:

Based on this answer, for which I performed some tests.

Stateful=False:

Normally (stateful=False), you have one batch with many sequences:

batch_x = [
            [[0],[1],[2],[3],[4],[5]],
            [[1],[2],[3],[4],[5],[6]],
            [[2],[3],[4],[5],[6],[7]],
            [[3],[4],[5],[6],[7],[8]]
          ]

The shape is (4,6,1). This means that you have:

  • 1 batch
  • 4 individual sequences = this is batch size and it can vary
  • 6 steps per sequence
  • 1 feature per step

Every time you train, either if you repeat this batch or if you pass a new one, it will see individual sequences. Every sequence is a unique entry.

Stateful=True:

When you go to a stateful layer, You are not going to pass individual sequences anymore. You are going to pass very long sequences divided in small batches. You will need more batches:

batch_x1 = [
             [[0],[1],[2]],
             [[1],[2],[3]],
             [[2],[3],[4]],
             [[3],[4],[5]]
           ]
batch_x2 = [
             [[3],[4],[5]], #continuation of batch_x1[0]
             [[4],[5],[6]], #continuation of batch_x1[1]
             [[5],[6],[7]], #continuation of batch_x1[2]
             [[6],[7],[8]]  #continuation of batch_x1[3]
           ]

Both shapes are (4,3,1). And this means that you have:

  • 2 batches
  • 4 individual sequences = this is batch size and it must be constant
  • 6 steps per sequence (3 steps in each batch)
  • 1 feature per step

The stateful layers are meant to huge sequences, long enough to exceed your memory or your available time for some task. Then you slice your sequences and process them in parts. There is no difference in the results, the layer is not smarter or has additional capabilities. It just doesn't consider that the sequences have ended after it processes one batch. It expects the continuation of those sequences.

In this case, you decide yourself when the sequences have ended and call model.reset_states() manually.