CUDA : How to copy a 3D array from host to device?

2019-02-21 00:46发布

I want to learn how can i copy a 3 dimensional array from host memory to device memory. Lets say i have a 3d array which contains data. For example int host_data[256][256][256]; I want to copy that data to dev_data (a device array) in such a way so host_data[x][y][z]=dev_data[x][y][z]; How can i do it? and how am i supposed to access the dev_data array in the device? A simple example would be very helpfull.

2条回答
▲ chillily
2楼-- · 2019-02-21 01:16

The common way is to flatten an array (make it one-dimensional). Then you'll have to make some calculations to map from (x,y,z) triple to one number - a position in a flattened one-dimensional array.

Example 2D:

int data[256][256];
int *flattened = data;
data[x][y] == fattened[x * 256 + y];

Example 3D:

int data[256][256][256];
int *flattened = data;
data[x][y][z] == flattened[x * 256 * 256 + y * 256 + z];

or use a wrapper:

__host__ __device___ inline int index(const int x, const int y, const int z) {
     return x * 256 * 256 + y * 256 + z;
}

Knowing that, you can allocate a linear array with cudaMalloc, as usual, then use an index function to access corresponding element in device code.

Update: The author of this question claims to have found a better solution (at least for 2D), you might want to have a look.

查看更多
我想做一个坏孩纸
3楼-- · 2019-02-21 01:26

For fixed dimensions (e.g. [256][256][256]) let the compiler do the work for you and follow this example. This is attractive because we need only do a single cudaMalloc/cudaMemcpy to tranfer the data, using a single pointer. If you must have variable dimensions, it's better to think about alternate ways to handle this due to the complexity, but you may wish to look at this example (referring to the second example code that I posted). Please be advised that this method is considerably more complicated and hard to follow. I recommend not using it if you can avoid it.

Edit: If you're willing to flatten your array, the answer provided by @Ixanezis is recommended, and is commonly used. My answer is based on the assumption that you really want to access the array using 3 subscripts both on the host and device. As pointed out in the other answer, however, you can simulate 3 subscript access using a macro or function to calculate offsets into a 1-D array.

查看更多
登录 后发表回答