When I was training my graph, I found that I forgot to add dropout in my graph. But I have already trained a long time and get some checkpoints. So is it possible for me to load the checkpoints and add a dropout and then continue training? My code is like this now:
# create a graph
vgg_fcn = fcn8_vgg_ours.FCN8VGG()
with tf.name_scope("content_vgg"):
vgg_fcn.build(batch_images, train = True, debug=True)
labels = tf.placeholder("int32", [None, HEIGHT, WIDTH])
# do something
...
#####
init_glb = tf.global_variables_initializer()
init_loc = tf.local_variables_initializer()
sess.run(init_glb)
sess.run(init_loc)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
ckpt_dir = "./checkpoints"
if not os.path.exists(ckpt_dir):
os.makedirs(ckpt_dir)
ckpt = tf.train.get_checkpoint_state(ckpt_dir)
start = 0
if ckpt and ckpt.model_checkpoint_path:
start = int(ckpt.model_checkpoint_path.split("-")[1])
print("start by epoch: %d"%(start))
saver = tf.train.Saver()
saver.restore(sess, ckpt.model_checkpoint_path)
last_save_epoch = start
# continue training
So if I have changed the structure of FCN8VGG(add some dropout layer), then will it use the meta file to replace the graph I have just created? If it will, how could I change the structure to continue training without training from scratch again?
Here's a simple example of initializing a new model using variables from another model's checkpoint. Note that things are much simpler if you can just pass a
variable_scope
toinit_from_checkpoint
, but here I'm assuming that the original model was not designed with restoring in mind.First define a simple model with some variables, and do some training:
Running
first_model()
, training looks fine and we get a first_model_checkpoint written:Next, we can define a completely new model in a different graph, and initialize the variables that it shares with first_model from that checkpoint:
In this case,
previous_variables
looks like:Note that since we haven't used any variable scopes, the naming is dependent on the order layers are defined. If names change, you need to manually construct the
restore_map
.If we run
second_model
, the loss jumps up initially because thebatch_norm
layer hasn't been trained:However, replacing
batch_norm
withtf.identity
verifies that the previously trained variables have been restored.