Plot loss evolution during a single epoch in Keras

2019-05-25 08:28发布

问题:

Does Keras have a built-in method to output (and later plot) the loss evolution during the training of a single epoch?

The usual method of using the function keras.callbacks.History() can output loss for each epoch. However in my case the training set is fairly large, and therefore I am passing a single epoch to the NN. Since I would like to plot the evolution of the training (and dev) loss during training, is there a way to do this?

I am currently solving this by dividing the training set into different batches and then training on each sequentially with a single epoch and saving the model each time. But maybe there is a built-in way to do this?

I am using TensorFlow backend.

回答1:

You can use a callback for this purpose.

Using the Keras MNIST CNN example (not copying the whole code here), with the following changes/additions:

from keras.callbacks import Callback

class TestCallback(Callback):
    def __init__(self, test_data):
        self.test_data = test_data

    def on_batch_end(self, batch, logs={}):
        x, y = self.test_data
        loss, acc = self.model.evaluate(x, y, verbose=0)
        print('\nTesting loss: {}, acc: {}\n'.format(loss, acc))

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=1,
          verbose=1,
          validation_data=(x_test, y_test),
          callbacks=[TestCallback((x_test, y_test))]
         )

for evaluating the test/validation set on each batch end, we get this:

Train on 60000 samples, validate on 10000 samples
Epoch 1/1

Testing loss: 0.0672039743446745, acc: 0.9781

  128/60000 [..............................] - ETA: 7484s - loss: 0.1450 - acc: 0.9531

/var/venv/DSTL/lib/python3.4/site-packages/keras/callbacks.py:120: UserWarning: Method on_batch_end() is slow compared to the batch update (15.416976). Check your callbacks.
  % delta_t_median)


Testing loss: 0.06644540682602673, acc: 0.9781

  256/60000 [..............................] - ETA: 7476s - loss: 0.1187 - acc: 0.9570

/var/venv/DSTL/lib/python3.4/site-packages/keras/callbacks.py:120: UserWarning: Method on_batch_end() is slow compared to the batch update (15.450395). Check your callbacks.
  % delta_t_median)


Testing loss: 0.06575664376271889, acc: 0.9782

However, as you will probably see for yourself, this has the severe drawback of slowing down the code significantly (and duly producing some relevant warnings). As a compromise, if you are OK with getting only the training performance at the end of each batch, you could use a slightly different callback:

class TestCallback2(Callback):
    def __init__(self, test_data):
        self.test_data = test_data

    def on_batch_end(self, batch, logs={}):
        print()  # just a dummy print command

The results now (replacing callbacks=[TestCallback2((x_test, y_test)) in model.fit()) are much faster, but giving only the training metrics at the end of each batch:

Train on 60000 samples, validate on 10000 samples
Epoch 1/1

  128/60000 [..............................] - ETA: 346s - loss: 0.8503 - acc: 0.7188
  256/60000 [..............................] - ETA: 355s - loss: 0.8496 - acc: 0.7109
  384/60000 [..............................] - ETA: 339s - loss: 0.7718 - acc: 0.7396
  [...]

UPDATE

All the above may be fine, but the resulting losses & accuracies are not stored anywhere, and hence they cannot be plotted; so, here is another callback solution that actually stores the metrics on the training set:

from keras.callbacks import Callback

class Histories(Callback):

    def on_train_begin(self,logs={}):
        self.losses = []
        self.accuracies = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))
        self.accuracies.append(logs.get('acc'))


histories = Histories()

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=1,
          verbose=1,
          validation_data=(x_test, y_test),
          callbacks=[histories]
         )

which results in the metrics at the end of each batch during training being stored in histories.losses and histories.accuracies, respectively - here are the first 5 entries of each:

histories.losses[:5]
# [2.3115866, 2.3008101, 2.2479887, 2.1895032, 2.1491694]

histories.accuracies[:5]
# [0.0703125, 0.1484375, 0.1875, 0.296875, 0.359375]