I have a vector, and I would like to do the following, using CUDA and Thrust transformations:
// thrust::device_vector v;
// for k times:
// calculate constants a and b as functions of k;
// for (i=0; i < v.size(); i++)
// v[i] = a*v[i] + b*v[i+1];
How should I correctly implement this? One way I can do it is to have vector w, and apply thrust::transform onto v and save the results to w. But k is unknown ahead of time, and I don't want to create w1, w2, ... and waste a lot of GPU memory space. Preferably I want to minimize the amount of data copying. But I'm not sure how to implement this using one vector without the values stepping on each other. Is there something Thrust provides that can do this?
If the v.size()
is large enough to fully utilize the GPU, you could launch k
kernels to do this, with a extra buffer mem and no extra data transfer.
thrust::device_vector u(v.size());
for(k=0;;)
{
// calculate a & b
thrust::transform(v.begin(), v.end()-1, v.begin()+1, u.begin(), a*_1 + b*_2);
k++;
if(k>=K)
break;
// calculate a & b
thrust::transform(u.begin(), u.end()-1, u.begin()+1, v.begin(), a*_1 + b*_2);
k++;
if(k>=K)
break;
}
I don't actually understand the "k times", but the following code may help you.
struct OP {
const int a, b;
OP(const int p, const int q): a(p), b(q){};
int operator()(const int v1, const int v2) {
return a*v1+b*v2;
}
}
thrust::device_vector<int> w(v.size());
thrust::transform(v.begin(), v.end()-1, //input_1
v.begin()+1, //input_2
w.begin(), //output
OP(a, b)); //functor
v = w;
I think learning about "functor", and several examples of thrust will give you a good guide.
Hope this will help you to solve your problem. :)