I'm running into a weird problem when trying to get TensorFlow's map_fn
to run on my GPU. Here's a minimal broken example:
import numpy as np
import tensorflow as tf
with tf.Session() as sess:
with tf.device("/gpu:0"):
def test_func(i):
return i
test_range = tf.constant(np.arange(5))
test = sess.run(tf.map_fn(test_func, test_range, dtype=tf.float32))
print(test)
This leads to the error:
InvalidArgumentError: Cannot assign a device for operation 'map/TensorArray_1': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/device:GPU:0' Colocation Debug Info: Colocation group had the following types and devices: TensorArrayScatterV3: CPU TensorArrayGatherV3: GPU CPU Range: GPU CPU TensorArrayWriteV3: CPU TensorArraySizeV3: GPU CPU TensorArrayReadV3: CPU Enter: GPU CPU TensorArrayV3: CPU Const: GPU CPU
Colocation members and user-requested devices:
map/TensorArrayStack/range/delta (Const)
map/TensorArrayStack/range/start (Const) map/TensorArray_1 (TensorArrayV3) map/while/TensorArrayWrite/TensorArrayWriteV3/Enter (Enter) /device:GPU:0 map/TensorArrayStack/TensorArraySizeV3 (TensorArraySizeV3) map/TensorArrayStack/range (Range)
map/TensorArrayStack/TensorArrayGatherV3 (TensorArrayGatherV3)
map/TensorArray (TensorArrayV3) map/while/TensorArrayReadV3/Enter (Enter) /device:GPU:0 Const (Const) /device:GPU:0
map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 (TensorArrayScatterV3) /device:GPU:0 map/while/TensorArrayReadV3 (TensorArrayReadV3) /device:GPU:0
map/while/TensorArrayWrite/TensorArrayWriteV3 (TensorArrayWriteV3) /device:GPU:0[[Node: map/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=, identical_element_shapes=true, tensor_array_name=""]]
The code behaves as expected when run on my CPU, and simple operations such as:
import numpy as np
import tensorflow as tf
with tf.Session() as sess:
with tf.device("/gpu:0"):
def test_func(i):
return i
test_range = tf.constant(np.arange(5))
test = sess.run(tf.add(test_range, test_range))
print(test)
work fine on my GPU. This post seems to describe a similar issue. Does anyone have any tips? The answer on that post implies that map_fn
should work fine on the GPU. I'm running version 1.8.0 of TensorFlow on Python 3.6.4 on Arch Linux, with CUDA version 9.0 and cuDNN version 7.0 on a GeForce GTX 1050.
Thanks!