Fastest way to access device vector elements direc

2019-05-06 04:22发布

I refer you to following page http://code.google.com/p/thrust/wiki/QuickStartGuide#Vectors. Please see second paragraph where it says that

Also note that individual elements of a device_vector can be accessed using the standard bracket notation. However, because each of these accesses requires a call to cudaMemcpy, they should be used sparingly. We'll look at some more efficient techniques later.

I searched all over the document but I could not find the more efficient technique. Does anyone know the fastest way to do this? i.e how to access device vector/device pointer on host fastest?

标签： cuda thrust

2条回答

Bombasti

2楼-- · 2019-05-06 05:09

The "more efficient techniques" the guide alludes to are the Thrust algorithms. It's more efficient to access (or copy across the PCI-E bus) millions of elements at once than it is to access a single element because the fixed cost of CPU/GPU communication is amortized.

There's no faster way to copy data from the GPU to the CPU than by calling cudaMemcpy, because it is the most primitive way for a CUDA programmer to implement the task.

0人赞添加讨论(0) 举报

地球回转人心会变

3楼-- · 2019-05-06 05:20

If you have a device_vector which you need to do more processing on, try to keep the data on the device and process it with Thrust algorithms or your own kernels. If you need to read only a few values from the device_vector, just access the values directly with bracket notation. If you need to access more than a few values, copy the device_vector over to a host_vector and read the the values from there.

thrust::device_vector<int> D;
...
thrust::host_vector<int> H = D;

0人赞添加讨论(0) 举报

Fastest way to access device vector elements direc

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间