I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism?
I searched the Tensorflow Docs but did not find an example; only sentences saying that it would be easy with Estimator.
Does anybody have a good example using the tf.learn.Estimator? Or a link to a tutorial or so?
I think tf.contrib.estimator.replicate_model_fn is a cleaner solution. The following is from tf.contrib.estimator.replicate_model_fn documentation,
What you need to do is to wrap optimizer with
tf.contrib.estimator.TowerOptimize
andmodel_fn()
withtf.contrib.estimator.replicate_model_fn()
. I followed the description and make an TPU squeezenet model work on a machine with 4 GPUs. My modifications here.The standard example is: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/contrib/learn/python/learn/estimators/estimator.py
One way to run it data-parallel would be to loop over available GPU devices, and send chunks of your batch to copied versions of your model (all done within your model_fn), then merge the results.
You can use scope and device for that:
Full example there: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py