I want to save the final state of my LSTM such that it's included when I restore the model and can be used for prediction. As explained below, the Saver only has knowledge of the final state when I use tf.assign
. However, this throws an error (also explained below).
During training I always feed the final LSTM state back into the network, as explained in this post. Here are the important parts of the code:
When building the graph:
self.init_state = tf.placeholder(tf.float32, [
self.n_layers, 2, self.batch_size, self.n_hidden
])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
And during training:
_current_state = np.zeros((self.n_layers, 2, self.batch_size,
self.n_hidden))
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels,
self.init_state: _current_state})
When I later restore my model from a checkpoint, the final state is not restored as well. As outlined in this post the problem is that the Saver has no knowledge of the new state. The post also suggests a solution, based on tf.assign
. Regrettably, I cannot use the suggested
assign_op = tf.assign(self.init_state, _current_state)
self.sess.run(assign_op)
because self.init state is not a Variable but a placeholder. I get the error
AttributeError: 'Tensor' object has no attribute 'assign'
I have tried to solve this problem for several hours now but I can't get it to work.
Any help is appreciated!
EDIT:
I have changed self.init_state to
self.init_state = tf.get_variable('saved_state', shape=
[self.n_layers, 2, self.batch_size, self.n_hidden])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
And during training I don't feed a value for self.init_state:
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels})
However, I still can't run the assignment op. Know I get
TypeError: Expected float32 passed to parameter 'value' of op 'Assign', got (LSTMStateTuple(c=array([[ 0.07291573, -0.06366599, -0.23425588, ..., 0.05307654,