Assuming that I want to update a pre-trained word-embedding matrix during training, is there a way to update only a subset of the word embedding matrix?
I have looked into the Tensorflow API page and found this:
# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)
# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1])) for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)
However how do I apply that to the word-embedding matrix. Suppose I do:
word_emb = tf.Variable(0.2 * tf.random_uniform([syn0.shape[0],s['es']], minval=-1.0, maxval=1.0, dtype=tf.float32),name='word_emb',trainable=False)
gather_emb = tf.gather(word_emb,indices) #assuming that I pass some indices as placeholder through feed_dict
opt = tf.train.AdamOptimizer(1e-4)
grad = opt.compute_gradients(loss,gather_emb)
How do I then use opt.apply_gradients
and tf.scatter_update
to update the original embeddign matrix? (Also, tensorflow throws an error if the second argument of compute_gradient
is not a tf.Variable
)
TL;DR: The default implementation of
opt.minimize(loss)
, TensorFlow will generate a sparse update forword_emb
that modifies only the rows ofword_emb
that participated in the forward pass.The gradient of the
tf.gather(word_emb, indices)
op with respect toword_emb
is atf.IndexedSlices
object (see the implementation for more details). This object represents a sparse tensor that is zero everywhere, except for the rows selected byindices
. A call toopt.minimize(loss)
callsAdamOptimizer._apply_sparse(word_emb_grad, word_emb)
, which makes a call totf.scatter_sub(word_emb, ...)
* that updates only the rows ofword_emb
that were selected byindices
.If on the other hand you want to modify the
tf.IndexedSlices
that is returned byopt.compute_gradients(loss, word_emb)
, you can perform arbitrary TensorFlow operations on itsindices
andvalues
properties, and create a newtf.IndexedSlices
that can be passed toopt.apply_gradients([(word_emb, ...)])
. For example, you could cap the gradients usingMyCapper()
(as in the example) using the following calls:Similarly, you could change the set of indices that will be modified by creating a new
tf.IndexedSlices
with a different indices.* In general, if you want to update only part of a variable in TensorFlow, you can use the
tf.scatter_update()
,tf.scatter_add()
, ortf.scatter_sub()
operators, which respectively set, add to (+=
) or subtract from (-=
) the value previously stored in a variable.Since you just want to select the elements to be updated (and not to change the gradients), you can do as follows.
Let
indices_to_update
be a boolean tensor that indicates the indices you wish to update, andentry_stop_gradients
is defined in the link, Then:(Source)