I am using tf.estimator.Estimator
for developing my model,
I wrote a model_fn
and trained 50,000 iterations, now I want to make a small change in my model_fn
, for example add a new layer.
I don't want to start training from scratch, I want to restore all the old variables from the 50,000 checkpoint, and continue training from this point. When I try to do so I get a NotFoundError
How can this be done with tf.estimator.Estimator
?
TL;DR The easiest way to load variables from a previous checkpoint is to use the function tf.train.init_from_checkpoint()
. Just one call to this function inside the model_fn
of your Estimator will override the initializers of the corresponding variables.
First model with two hidden layers
In more details, suppose you have trained a first model with two hidden layers on MNIST, named model_fn_1
. The weights are saved in directory mnist_1
.
def model_fn_1(features, labels, mode):
images = features['image']
h1 = tf.layers.dense(images, 100, activation=tf.nn.relu, name="h1")
h2 = tf.layers.dense(h1, 100, activation=tf.nn.relu, name="h2")
logits = tf.layers.dense(h2, 10, name="logits")
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
# Estimator 1: two hidden layers
estimator_1 = tf.estimator.Estimator(model_fn_1, model_dir='mnist_1')
estimator_1.train(input_fn=train_input_fn, steps=1000)
Second model with three hidden layers
Now we want to train a new model model_fn_2
with three hidden layers. We want to load the weights for the first two hidden layers h1
and h2
. We use tf.train.init_from_checkpoint()
to do this:
def model_fn_2(features, labels, mode, params):
images = features['image']
h1 = tf.layers.dense(images, 100, activation=tf.nn.relu, name="h1")
h2 = tf.layers.dense(h1, 100, activation=tf.nn.relu, name="h2")
h3 = tf.layers.dense(h2, 100, activation=tf.nn.relu, name="h3")
assignment_map = {
'h1/': 'h1/',
'h2/': 'h2/'
}
tf.train.init_from_checkpoint('mnist_1', assignment_map)
logits = tf.layers.dense(h3, 10, name="logits")
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
# Estimator 2: three hidden layers
estimator_2 = tf.estimator.Estimator(model_fn_2, model_dir='mnist_2')
estimator_2.train(input_fn=train_input_fn, steps=1000)
The assignment_map
will load every variable from scope h1/
in the checkpoint into the new scope h1/
, and same with h2/
. Don't forget the /
at the end to make TensorFlow know it's a variable scope.
I couldn't find a way to make this work using pre-made estimators, since you can't change their model_fn
.