How to update model parameters with accumulated gr

I'm using TensorFlow to build a deep learning model. And new to TensorFlow.

Due to some reason, my model has limited batch size, then this limited batch-size will make the model has a high variance.

So, I want to use some trick to make the batch size larger. My idea is to store the gradients of each mini-batch, for example 64 mini-batches, and then sum the gradients together, use the mean gradients of this 64 mini batches of training data to update the model's parameters.

This means that for the first 63 mini-batches, do not update the parameters, and after the 64 mini batch, update the model's parameters only once.

But as TensorFlow is graph based, do anyone know how to implement this wanted feature?

I found a solution here:

opt = tf.train.AdamOptimizer()
tvs = tf.trainable_variables()
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]                                        
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
gvs = opt.compute_gradients(rmse, tvs)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

In the training loop:

while True:
    for i in xrange(n_minibatches):, feed_dict=dict(X: Xs[i], y: ys[i]))

But this code seems not very clean and pretty, does anyone know how to optimize these code?


I had the same problem and just figured it out.

First get symbolic gradients, then define accumulated gradients as tf.Variables. (It seems that tf.global_variables_initializer() has to be run before defining grads_accum. I got errors otherwise, not sure why.)

tvars = tf.trainable_variables()
optimizer = tf.train.GradientDescentOptimizer(lr)
grads = tf.gradients(cost, tvars)

# initialize

grads_accum = [tf.Variable(tf.zeros_like(v)) for v in grads] 
update_op = optimizer.apply_gradients(zip(grads_accum, tvars)) 

In training you can accumulate gradients (saved in gradients_accum) at each batch, and update the model after running the 64-th batch:

feed_dict = dict()
for i, _grads in enumerate(gradients_accum):
    feed_dict[grads_accum[i]] = _grads[update_op], feed_dict=feed_dict) 

You can refer to tensorflow/tensorflow/python/training/ for example usage, particularly this function: testGradientsAsVariables().

The previous solutions do not compute the average of the accumulated gradients, which may lead to instability in training. I've modified the above code, which should solve this problem.

# Fetch a list of our network's trainable parameters.
trainable_vars = tf.trainable_variables()

# Create variables to store accumulated gradients
accumulators = [
    ) for tv in trainable_vars

# Create a variable for counting the number of accumulations
accumulation_counter = tf.Variable(0.0, trainable=False)

# Compute gradients; grad_pairs contains (gradient, variable) pairs
grad_pairs = optimizer.compute_gradients(loss, trainable_vars)

# Create operations which add a variable's gradient to its accumulator.
accumulate_ops = [
    ) for (accumulator, (grad, var)) in zip(accumulators, grad_pairs)

# The final accumulation operation is to increment the counter

# Update trainable variables by applying the accumulated gradients
# divided by the counter. Note: apply_gradients takes in a list of 
# (grad, var) pairs
train_step = optimizer.apply_gradients(
    [(accumulator / accumulation_counter, var) \
        for (accumulator, (grad, var)) in zip(accumulators, grad_pairs)]

# Accumulators must be zeroed once the accumulated gradient is applied.
zero_ops = [
    ) for (accumulator, tv) in zip(accumulators, trainable_vars)

# Add one last op for zeroing the counter

This code is used in the same manner as that provided by @weixsong.