Following code of Tensorflow's GRUCell
unit shows typical operations to get a updated hidden state, when previous hidden state is provided along with current input in the sequence.
def __call__(self, inputs, state, scope=None):
"""Gated recurrent unit (GRU) with nunits cells."""
with vs.variable_scope(scope or type(self).__name__): # "GRUCell"
with vs.variable_scope("Gates"): # Reset gate and update gate.
# We start with bias of 1.0 to not reset and not update.
r, u = array_ops.split(1, 2, _linear([inputs, state],
2 * self._num_units, True, 1.0))
r, u = sigmoid(r), sigmoid(u)
with vs.variable_scope("Candidate"):
c = self._activation(_linear([inputs, r * state],
self._num_units, True))
new_h = u * state + (1 - u) * c
return new_h, new_h
But I don't see any weights
and biases
here.
e.g. my understanding was that getting r
and u
would require weights and biases to be multiplied with current input and/or hidden state to get an updated hidden state.
I have written a gru unit as follows:
def gru_unit(previous_hidden_state, x):
r = tf.sigmoid(tf.matmul(x, Wr) + br)
z = tf.sigmoid(tf.matmul(x, Wz) + bz)
h_ = tf.tanh(tf.matmul(x, Wx) + tf.matmul(previous_hidden_state, Wh) * r)
current_hidden_state = tf.mul((1 - z), h_) + tf.mul(previous_hidden_state, z)
return current_hidden_state
Here I explicitly make use of weights Wx, Wr, Wz, Wh
and biases br, bh, bz
, etc. to get updated hidden state. These weights and biases are what get learned/tuned after training.
How can I make use of Tensorflow's built-in GRUCell
to achieve the same result as above?
They are there you just don't see them in that code because the _linear function adds the weights and biases.
...