Recently I have been using thrust a lot. I have noticed that in order to use thrust, one must always copy the data from the cpu memory to the gpu memory.
Let's see the following example :
int foo(int *foo)
{
host_vector<int> m(foo, foo+ 100000);
device_vector<int> s = m;
}
I'm not quite sure how the host_vector
constructor works, but it seems like I'm copying the initial data, coming from *foo
, twice - once to the host_vector when it is initialized, and another time when device_vector
is initialized. Is there a better way of copying from cpu to gpu without making an intermediate data copies? I know I can use device_ptr
as a wrapper, but that still doesn't fix my problem.
thanks!
One of device_vector
's constructors takes a range of elements specified by two iterators. It's smart enough to understand the raw pointer in your example, so you can construct a device_vector
directly and avoid the temporary host_vector
:
void my_function_taking_host_ptr(int *raw_ptr, size_t n)
{
// device_vector assumes raw_ptrs point to system memory
thrust::device_vector<int> vec(raw_ptr, raw_ptr + n);
...
}
If your raw pointer points to CUDA memory, introduce a device_ptr
:
void my_function_taking_cuda_ptr(int *raw_ptr, size_t n)
{
// wrap raw_ptr before passing to device_vector
thrust::device_ptr<int> d_ptr(raw_ptr);
thrust::device_vector<int> vec(d_ptr, d_ptr + n);
...
}
Using a device_ptr
doesn't allocate any storage; it just encodes the location of the pointer in the type system.