Distributed Tensorflow device placement in Google

I am running a large distributed Tensorflow model in google cloud ML engine. I want to use machines with GPUs. My graph consists of two main the parts the input/data reader function and the computation part.

I wish to place variables in the PS task, the input part in the CPU and the computation part on the GPU. The function tf.train.replica_device_setter automatically places variables in the PS server.

This is what my code looks like:

with tf.device(tf.train.replica_device_setter(cluster=cluster_spec)):
    input_tensors = model.input_fn(...)
    output_tensors = model.model_fn(input_tensors, ...)

Is it possible to use tf.device() together with replica_device_setter() as in:

with tf.device(tf.train.replica_device_setter(cluster=cluster_spec)):
    with tf.device('/cpu:0')
        input_tensors = model.input_fn(...)
    with tf.device('/gpu:0')
        tensor_dict = model.model_fn(input_tensors, ...)

Will the replica_divice_setter() be overridden and variables not placed in the PS server?

Furthermore, since the device names in the cluster are something like job:master/replica:0/task:0/gpu:0 how do I say to Tensorflow tf.device(whatever/gpu:0)?

Any operations, beyond variables, in the tf.train.replica_device_setter block are automatically pinned to "/job:worker", which will default to the first device managed by the first task in the "worker" job.

You can pin them to another device (or task) by using embedded device block:

with tf.device(tf.train.replica_device_setter(ps_tasks=2, ps_device="/job:ps", 
                                          worker_device="/job:worker")):
  v1 = tf.Variable(1., name="v1")  # pinned to /job:ps/task:0 (defaults to /cpu:0)
  v2 = tf.Variable(2., name="v2")  # pinned to /job:ps/task:1 (defaults to /cpu:0)
  v3 = tf.Variable(3., name="v3")  # pinned to /job:ps/task:0 (defaults to /cpu:0)
  s = v1 + v2            # pinned to /job:worker (defaults to task:0/cpu:0)
  with tf.device("/task:1"):
    p1 = 2 * s           # pinned to /job:worker/task:1 (defaults to /cpu:0)
    with tf.device("/cpu:0"):
      p2 = 3 * s         # pinned to /job:worker/task:1/cpu:0