How to run Tensorflow Estimator on multiple GPUs w

I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism?

I searched the Tensorflow Docs but did not find an example; only sentences saying that it would be easy with Estimator.

Does anybody have a good example using the tf.learn.Estimator? Or a link to a tutorial or so?

标签： tensorflow tensorflow-gpu multi-gpu

3条回答

叼着烟拽天下

2楼-- · 2019-03-09 09:56

I think tf.contrib.estimator.replicate_model_fn is a cleaner solution. The following is from tf.contrib.estimator.replicate_model_fn documentation,

...
def model_fn(...):  # See `model_fn` in `Estimator`.
  loss = ...
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
  optimizer = tf.contrib.estimator.TowerOptimizer(optimizer)
  if mode == tf.estimator.ModeKeys.TRAIN:
    #  See the section below on `EstimatorSpec.train_op`.
    return EstimatorSpec(mode=mode, loss=loss,
                         train_op=optimizer.minimize(loss))

  #  No change for `ModeKeys.EVAL` or `ModeKeys.PREDICT`.
  return EstimatorSpec(...)
...
classifier = tf.estimator.Estimator(
  model_fn=tf.contrib.estimator.replicate_model_fn(model_fn))

What you need to do is to wrap optimizer with tf.contrib.estimator.TowerOptimize and model_fn() with tf.contrib.estimator.replicate_model_fn(). I followed the description and make an TPU squeezenet model work on a machine with 4 GPUs. My modifications here.

0人赞添加讨论(0) 举报

手持菜刀，她持情操

3楼-- · 2019-03-09 09:57

The standard example is: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/contrib/learn/python/learn/estimators/estimator.py

One way to run it data-parallel would be to loop over available GPU devices, and send chunks of your batch to copied versions of your model (all done within your model_fn), then merge the results.

0人赞添加讨论(0) 举报

劫难

4楼-- · 2019-03-09 10:02

You can use scope and device for that:

 with tf.variable_scope(tf.get_variable_scope()):
  for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
      with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:

Full example there: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py

0人赞添加讨论(0) 举报

How to run Tensorflow Estimator on multiple GPUs w

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间