可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a vector, and I would like to do the following, using CUDA and Thrust transformations:

// thrust::device_vector v;
// for k times:
//     calculate constants a and b as functions of k;
//     for (i=0; i < v.size(); i++)
//         v[i] = a*v[i] + b*v[i+1];

How should I correctly implement this? One way I can do it is to have vector w, and apply thrust::transform onto v and save the results to w. But k is unknown ahead of time, and I don't want to create w1, w2, ... and waste a lot of GPU memory space. Preferably I want to minimize the amount of data copying. But I'm not sure how to implement this using one vector without the values stepping on each other. Is there something Thrust provides that can do this?

回答1:

If the v.size() is large enough to fully utilize the GPU, you could launch k kernels to do this, with a extra buffer mem and no extra data transfer.

thrust::device_vector u(v.size());
for(k=0;;)
{
    // calculate a & b
    thrust::transform(v.begin(), v.end()-1, v.begin()+1, u.begin(), a*_1 + b*_2);
    k++;
    if(k>=K)
        break;

    // calculate a & b
    thrust::transform(u.begin(), u.end()-1, u.begin()+1, v.begin(), a*_1 + b*_2);
    k++;
    if(k>=K)
        break;      
}

回答2:

I don't actually understand the "k times", but the following code may help you.

struct OP {
    const int a, b;
    OP(const int p, const int q): a(p), b(q){};
    int operator()(const int v1, const int v2) {
      return a*v1+b*v2;
    }
}
thrust::device_vector<int> w(v.size());
thrust::transform(v.begin(), v.end()-1, //input_1
                  v.begin()+1,          //input_2
                  w.begin(),            //output
                  OP(a, b));            //functor
v = w;

I think learning about "functor", and several examples of thrust will give you a good guide.

Hope this will help you to solve your problem. :)