I have a vector, and I would like to do the following, using CUDA and Thrust transformations:
// thrust::device_vector v;
// for k times:
// calculate constants a and b as functions of k;
// for (i=0; i < v.size(); i++)
// v[i] = a*v[i] + b*v[i+1];
How should I correctly implement this? One way I can do it is to have vector w, and apply thrust::transform onto v and save the results to w. But k is unknown ahead of time, and I don't want to create w1, w2, ... and waste a lot of GPU memory space. Preferably I want to minimize the amount of data copying. But I'm not sure how to implement this using one vector without the values stepping on each other. Is there something Thrust provides that can do this?
I don't actually understand the "k times", but the following code may help you.
I think learning about "functor", and several examples of thrust will give you a good guide.
Hope this will help you to solve your problem. :)
If the
v.size()
is large enough to fully utilize the GPU, you could launchk
kernels to do this, with a extra buffer mem and no extra data transfer.