TensorFlow's map_fn only runs on CPU

I'm running into a weird problem when trying to get TensorFlow's map_fn to run on my GPU. Here's a minimal broken example:

import numpy as np
import tensorflow as tf

with tf.Session() as sess:
    with tf.device("/gpu:0"):
        def test_func(i):
            return i
        test_range = tf.constant(np.arange(5))
        test = sess.run(tf.map_fn(test_func, test_range, dtype=tf.float32))
print(test)

This leads to the error:

InvalidArgumentError: Cannot assign a device for operation 'map/TensorArray_1': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/device:GPU:0' Colocation Debug Info: Colocation group had the following types and devices: TensorArrayScatterV3: CPU TensorArrayGatherV3: GPU CPU Range: GPU CPU TensorArrayWriteV3: CPU TensorArraySizeV3: GPU CPU TensorArrayReadV3: CPU Enter: GPU CPU TensorArrayV3: CPU Const: GPU CPU

Colocation members and user-requested devices:
map/TensorArrayStack/range/delta (Const)
map/TensorArrayStack/range/start (Const) map/TensorArray_1 (TensorArrayV3) map/while/TensorArrayWrite/TensorArrayWriteV3/Enter (Enter) /device:GPU:0 map/TensorArrayStack/TensorArraySizeV3 (TensorArraySizeV3) map/TensorArrayStack/range (Range)
map/TensorArrayStack/TensorArrayGatherV3 (TensorArrayGatherV3)
map/TensorArray (TensorArrayV3) map/while/TensorArrayReadV3/Enter (Enter) /device:GPU:0 Const (Const) /device:GPU:0
map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 (TensorArrayScatterV3) /device:GPU:0 map/while/TensorArrayReadV3 (TensorArrayReadV3) /device:GPU:0
map/while/TensorArrayWrite/TensorArrayWriteV3 (TensorArrayWriteV3) /device:GPU:0

[[Node: map/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=, identical_element_shapes=true, tensor_array_name=""]]

The code behaves as expected when run on my CPU, and simple operations such as:

import numpy as np
import tensorflow as tf

with tf.Session() as sess:
    with tf.device("/gpu:0"):
        def test_func(i):
            return i
        test_range = tf.constant(np.arange(5))
        test = sess.run(tf.add(test_range, test_range))
print(test)

work fine on my GPU. This post seems to describe a similar issue. Does anyone have any tips? The answer on that post implies that map_fn should work fine on the GPU. I'm running version 1.8.0 of TensorFlow on Python 3.6.4 on Arch Linux, with CUDA version 9.0 and cuDNN version 7.0 on a GeForce GTX 1050.

Thanks!

The error actually stems from the fact that np.arange produces int32s by default but you specified a float32 return type. The error is gone with

import numpy as np
import tensorflow as tf

with tf.Session() as sess:
    with tf.device("/gpu:0"):
        def test_func(i):
            return i
        test_range = tf.constant(np.arange(5, dtype=np.float32))
        test = sess.run(tf.map_fn(test_func, test_range, dtype=tf.float32))
print(test)

I agree that the error message you got is rather confusing. You get the "real" error message by removing device placement:

import numpy as np
import tensorflow as tf

with tf.Session() as sess:
        def test_func(i):
            return i
        test_range = tf.constant(np.arange(5))
        test = sess.run(tf.map_fn(test_func, test_range, dtype=tf.float32))
print(test)
# InvalidArgumentError (see above for traceback): TensorArray dtype is float but Op is trying to write dtype int32.