I'm writing some code to optimize a neural net architecture and so have a python function create_nn(parms)
that creates and initializes a keras model.
However, the problem I'm having is that after a fewer iterations the models take a lot longer to train than usual (initally one epoch takes 10sec, and then after roughly the 14th model (each model trains for 20 epochs) it takes 60sec/epoch).
I know that this is not because of the evolving architecture because if I restart the script and start were it ended, it is back to normal speeds.
I'm currently running
from keras import backend as K
and then a
K.clear_session()
after training any given new model.
Some additional details:
For the first 12 models, training time per epoch remains roughly constant at 10sec/epoch. Then at the 13th model training time per epoch climbs steadily to 60sec. Then training time per epoch hovers at around 60sec/epoch.
I'm running keras with Tensorflow as the backend
I'm using an Amazon EC2 t2.xlarge instance
There is plenty of free RAM (7GB free, w/ the dataset of size 5GB)
I've removed a bunch of layers and parameters, but essentially create_nn
looks like:
def create_nn(features, timesteps, number_of_filters):
inputs = Input(shape = (timesteps, features))
x = GaussianNoise(stddev=0.005)(inputs)
#Layer 1.1
x = Convolution1D(number_of_filters, 3, padding='valid')(x)
x = Activation('relu')(x)
x = Flatten()(x)
x = Dense(10)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
# Output layer
outputs = Dense(1, activation='sigmoid')(x)
model = Model(inputs=inputs, outputs=outputs)
# Compile and Return
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
print('CNN model built succesfully.')
return model
Note that while a Sequential
model would've worked in this dummy example, the functional API is required for the actual usecase.
How can I fix this problem?
Why is my training time increasing after every run?
Short answer : you need to use tf.keras.backend.clear_session()
before every new model that you create.
This problem only seems to happen when eager execution is turned off.
Okay, so let's run an experiment with and without clear_session. The code for make_model
is at the end of this response.
First, let's look at the training time when using clear session. We'll run this experiment 10 times an print the results
Use tf.keras.backend.clear_session()
non_seq_time = [ make_model(clear_session=True) for _ in range(10)]
With clear_session=True
non sequential
Elapse = 1.06039
Elapse = 1.20795
Elapse = 1.04357
Elapse = 1.03374
Elapse = 1.02445
Elapse = 1.00673
Elapse = 1.01712
Elapse = 1.021
Elapse = 1.17026
Elapse = 1.04961
As you can see, the training time stays about constant
Now let's re-run the experiment without using clear session and review the training time
Don't use tf.keras.backend.clear_session()
non_seq_time = [ make_model(clear_session=False) for _ in range(10)]
With clear_session=False
non sequential
Elapse = 1.10954
Elapse = 1.13042
Elapse = 1.12863
Elapse = 1.1772
Elapse = 1.2013
Elapse = 1.31054
Elapse = 1.27734
Elapse = 1.32465
Elapse = 1.32387
Elapse = 1.33252
as you can see, the training time increases without clear_session
Full Code Example
# Training time increases - and how to fix it
# Setup and imports
# %tensorflow_version 2.x
import tensorflow as tf
import tensorflow.keras.layers as layers
import tensorflow.keras.models as models
from time import time
# if you comment this out, the problem doesn't happen
# it only happens when eager execution is disabled !!
tf.compat.v1.disable_eager_execution()
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Let's build that network
def make_model(activation="relu", hidden=2, units=100, clear_session=False):
# -----------------------------------
# . HERE WE CAN TOGGLE CLEAR SESSION
# -----------------------------------
if clear_session:
tf.keras.backend.clear_session()
start = time()
inputs = layers.Input(shape=[784])
x = inputs
for num in range(hidden) :
x = layers.Dense(units=units, activation=activation)(x)
outputs = layers.Dense(units=10, activation="softmax")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
results = model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=200, verbose=0)
elapse = time()-start
print(f"Elapse = {elapse:8.6}")
return elapse
# Let's try it out and time it
# prime it first
make_model()
print("Use clear session")
non_seq_time = [ make_model(clear_session=True) for _ in range(10)]
print("Don't use clear session")
non_seq_time = [ make_model(clear_session=False) for _ in range(10)]