Lambda expressions with CUDA

2020-06-02 17:42发布

问题:

If I use thrust::transform on thrust::host, the lambda usage is fine

thrust::transform(thrust::host, a, a+arraySize,b,d,[](int a, int b)->int
{
    return a + b;
});

However, if I change thrust::host to thrust::device, the code wouldn't pass the compiler. Here is the error on VS2013:

The closure type for a lambda ("lambda [](int, int)->int") cannot be used in the template argument type of a __global__ function template instantiation, unless the lambda is defined within a __device__ or __global__ function

So, the problem is how using __device__ or __global__ in connection to device lambdas.

回答1:

In CUDA 7 it is not possible. Quoting from Mark Harris:

That isn't supported today in CUDA, because the lambda is host code. Passing lambdas from host to device is a challenging problem, but it is something we will investigate for a future CUDA release.

What you can do in CUDA 7 is call thrust algorithms from your device code, and in that case you can pass lambdas to them...

With CUDA 7, thrust algorithms can be called from device code (e.g. CUDA kernels, or __device__ functors). In those situations, you can use (device) lambdas with thrust. An example is given in the parallelforall blog post here.

However, CUDA 7.5 introduces an experimental device lambda feature. This feature is described here:

CUDA 7.5 introduces an experimental feature: GPU lambdas. GPU lambdas are anonymous device function objects that you can define in host code, by annotating them with a __device__ specifier.

In order to enable compilation for this feature, (currently, with CUDA 7.5) it's necessary to specify --expt-extended-lambda on the nvcc compile command line.



回答2:

This simple code using device lambdas work under CUDA 8.0 RC, although device lambdas for this version of CUDA are still at an experimental stage:

#include <thrust/device_vector.h>
#include <thrust/functional.h>
#include <thrust/transform.h>

using namespace thrust::placeholders;

int main(void)
{
    // --- Input data 
    float a = 2.0f;
    float x[4] = { 1, 2, 3, 4 };
    float y[4] = { 1, 1, 1, 1 };

    thrust::device_vector<float> X(x, x + 4);
    thrust::device_vector<float> Y(y, y + 4);

    thrust::transform(X.begin(), 
                      X.end(),  
                      Y.begin(), 
                      Y.begin(),
                      [=] __host__ __device__ (float x, float y) { return a * x + y; }      // --- Lambda expression 
                     );        

    for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl;

    return 0;
}

Remember to use

--expt-extended-lambda

for compilation.