I am using several LSTM layers to form a deep recurrent neural network. I would like to monitor the weights of each LSTM layer during training. However, I couldn't find out how to attach summaries of the LSTM layer weights to TensorBoard.
Any suggestions how this can be done?
The code is as follows:
cells = []
with tf.name_scope("cell_1"):
cell1 = tf.contrib.rnn.LSTMCell(self.embd_size, state_is_tuple=True, initializer=self.initializer)
cell1 = tf.contrib.rnn.DropoutWrapper(cell1,
input_keep_prob=self.input_dropout,
output_keep_prob=self.output_dropout,
state_keep_prob=self.recurrent_dropout)
cells.append(cell1)
with tf.name_scope("cell_2"):
cell2 = tf.contrib.rnn.LSTMCell(self.n_hidden, state_is_tuple=True, initializer=self.initializer)
cell2 = tf.contrib.rnn.DropoutWrapper(cell2,
output_keep_prob=self.output_dropout,
state_keep_prob=self.recurrent_dropout)
cells.append(cell2)
with tf.name_scope("cell_3"):
cell3 = tf.contrib.rnn.LSTMCell(self.embd_size, state_is_tuple=True, initializer=self.initializer)
# cell has no input dropout since previous cell already has output dropout
cell3 = tf.contrib.rnn.DropoutWrapper(cell3,
output_keep_prob=self.output_dropout,
state_keep_prob=self.recurrent_dropout)
cells.append(cell3)
cell = tf.contrib.rnn.MultiRNNCell(
cells, state_is_tuple=True)
output, self.final_state = tf.nn.dynamic_rnn(
cell,
inputs=self.inputs,
initial_state=self.init_state)
tf.contrib.rnn.LSTMCell
objects have a property called variables
that works for this. There's just one trick: The property returns an empty list until your cell goes through tf.nn.dynamic_rnn
. At least this is the case when using a single LSTMCell. I can't speak for MultiRNNCell
. So I expect this would work:
output, self.final_state = tf.nn.dynamic_rnn(...)
for one_lstm_cell in cells:
one_kernel, one_bias = one_lstm_cell.variables
# I think TensorBoard handles summaries with the same name fine.
tf.summary.histogram("Kernel", one_kernel)
tf.summary.histogram("Bias", one_bias)
And then you probably know how to do it from there, but
summary_op = tf.summary.merge_all()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_writer = tf.summary.FileWriter(
"my/preferred/logdir/train", graph=tf.get_default_graph())
for step in range(1, training_steps+1):
...
_, step_summary = sess.run([train_op, summary_op])
train_writer.add_summary(step_summary)
Looking at the TensorFlow documentation I linked above, there's also a weights
property. I don't know the difference, if there is any. And, the order of the variables
return isn't documented. I figured it out by printing the resulting list and looking at the variable names.
Now, MultiRNNCell
has the same variables
property according to its doc and it says it returns all layer variables. I honestly don't know how MultiRNNCell
works, so I cannot tell you whether these are variables belonging exclusively to MultiRNNCell
or if it includes variables from the cells that go into it. Either way, knowing the property exists should be a nice tip! Hope this helps.
Although variables
is documented for most (all?) RNN classes, it does break for DropoutWrapper
. The property has been documented since r1.2, but accessing the property causes an exception in 1.2 and 1.4 (and looks like 1.3, but untested). Specifically,
from tensorflow.contrib import rnn
...
lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)
wrapped_cell = rnn.DropoutWrapper(lstm_cell)
outputs, states = rnn.static_rnn(wrapped_cell, x, dtype=tf.float32)
print("LSTM vars!", lstm_cell.variables)
print("Wrapped vars!", wrapped_cell.variables)
will throw AttributeError: 'DropoutWrapper' object has no attribute 'trainable'
. From the traceback (or a long stare at the DropoutWrapper source), I noticed that variables
is implemented in DropoutWrapper's super RNNCell
's super Layer
. Dizzy yet? Indeed, we find the documented variables
property here. It returns the (documented) weights
property. The weights
property returns the (documented) self.trainable_weights + self.non_trainable_weights
properties. And finally the root of the problem:
@property
def trainable_weights(self):
return self._trainable_weights if self.trainable else []
@property
def non_trainable_weights(self):
if self.trainable:
return self._non_trainable_weights
else:
return self._trainable_weights + self._non_trainable_weights
That is, variables
does not work for a DropoutWrapper
instance. Neither will trainable_weights
or non_trainable_weights
sinceself.trainable
is not defined.
One step deeper, Layer.__init__
defaults self.trainable
to True
, but DropoutWrapper
never calls it. To quote a TensorFlow contributor on Github,
DropoutWrapper
does not have variables because it does not itself store any. It wraps a cell that may have variables; but it's not clear what the semantics should be if you access the DropoutWrapper.variables
. For example, all keras layers only report back the variables that they own; and so only one layer ever owns any variable. That said, this should probably return []
, and the reason it doesn't is that DropoutWrapper never calls super().__init__
in its constructor. That should be an easy fix; PRs welcome.
So for instance, to access the LSTM variables in the above example, lstm_cell.variables
suffices.
Edit: To the best of my knowledge, Mike Khan's PR has been incorporated into 1.5. Now, the variables property of the dropout layer returns an empty list.