I'm trying to learn distributed TensorFlow. Tried out a piece code as explained here:
with tf.device("/cpu:0"):
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
with tf.device("/cpu:1"):
y = tf.nn.softmax(tf.matmul(x, W) + b)
loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
Getting the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'MatMul': Operation was explicitly assigned to /device:CPU:1 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:CPU:1"](Placeholder, Variable/read)]]
Meaning that TensorFlow does not recognize CPU:1.
I'm running on a RedHat server with 40 CPUs (cat /proc/cpuinfo | grep processor | wc -l
).
Any ideas?
Following the link in the comment:
Turns out the session should be configured to have device count > 1:
config = tf.ConfigProto(device_count={"CPU": 8})
with tf.Session(config=config) as sess:
...
Kind of shocking that I missed something so basic, and no one could pinpoint to an error which seems too obvious.
Not sure if it's a problem with me or the TensorFlow code samples and documentation. Since it's Google, I'll have to say that it's me.
First, just run it on "one CPU", and see if Tensorflow distributes threads to all of the CPUs appropriately. It likely will multithread correctly and you won't have to do anything.
In the case where it doesn't, you should try launching multiple Tensorflow instances with different CPU affinities, and doing a "distributed" system. Tensorflow has distributed services for multiple machines; it should work as well with separate processes on one machine, as long as you correctly set up your files so that they aren't writing to the same locations. You can get start at https://www.tensorflow.org/deploy/distributed . You might want to set the CPU affinities so that it's one process per physical CPU, a-la https://askubuntu.com/questions/102258/how-to-set-cpu-affinity-to-a-process