identifier “__shfl_down” is undefined for cuda-7.5

2020-05-01 06:06发布

问题:

While compiling a program on cuda 7.5 with gcc 4.8.4 on ubuntu 14.04 (pretty old config), I get this error

error: identifier "__shfl_down" is undefined
      detected during instantiation of "T gmx_shfl_down_sync(unsigned int, T, unsigned int, int) [with T=float]"

which point to

template <typename T>
static __forceinline__ __device__
T gmx_shfl_down_sync(const unsigned int activeMask,
                 const T            var,
                 unsigned int       offset,
                 int                width = warp_size)
{
#if GMX_CUDA_VERSION < 9000
    GMX_UNUSED_VALUE(activeMask);
    return __shfl_down(var, offset, width);
#else
    return __shfl_down_sync(activeMask, var, offset, width);
#endif
}

Is there any way to fix that? I see this issue for old cuda versions, but haven't seen a clear answer for that.

回答1:

Warp shuffle intrinsics are only defined (only supported on) compute capability (cc) 3.0 architectures and higher.

After CUDA 8.0, those were the only GPUs supported by nvcc, so even if you compile for default architecture (3.0) it will compile correctly.

However for CUDA 8.0 and prior, cc 2.x architectures were still supported by nvcc and were still the "default" architecture (what you would get if you didn't specify any architecture switches on the nvcc compile command line).

Therefore, on CUDA 8.0 and prior (warp shuffle was introduced in CUDA 6), if you either specify a cc 2.x architecture, or specify no architecture, you will see this error.

Since warp shuffle is not supported on cc 2.x architectures, the solution is to explicitly specify a suitable architecture on your nvcc compile command line with -arch=sm_30 or similar.