I found an answer here, but it is not clear if I should reshape the array. Do I need to reshape the 2d array into 1d before passing it to pycuda kernel?
问题:
回答1:
There is no need to reshape a 2D gpuarray
in order to pass it to a CUDA kernel.
As I said in the answer you linked to, a 2D numpy or PyCUDA array is just an allocation of pitched linear memory, stored in row major order by default. Both have two members which tell you everything that you need to access an array - shape
and strides
. For example:
In [8]: X=np.arange(0,15).reshape((5,3))
In [9]: print X.shape
(5, 3)
In [10]: print X.strides
(12, 4)
The shape is self explanatory, the stride is the pitch of the storage in bytes. The best practice for kernel code will be to treat the pointer supplied by PyCUDA as if it were allocated using cudaMallocPitch
and treat the first element of stride
as the byte pitch of the rows in memory. A trivial example might look like this:
import pycuda.driver as drv
from pycuda.compiler import SourceModule
import pycuda.autoinit
import numpy as np
mod = SourceModule("""
__global__ void diag_kernel(float *dest, int stride, int N)
{
const int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid < N) {
float* p = (float*)((char*)dest + tid*stride) + tid;
*p = 1.0f;
}
}
""")
diag_kernel = mod.get_function("diag_kernel")
a = np.zeros((10,10), dtype=np.float32)
a_N = np.int32(a.shape[0])
a_stride = np.int32(a.strides[0])
a_bytes = a.size * a.dtype.itemsize
a_gpu = drv.mem_alloc(a_bytes)
drv.memcpy_htod(a_gpu, a)
diag_kernel(a_gpu, a_stride, a_N, block=(32,1,1))
drv.memcpy_dtoh(a, a_gpu)
print a
Here some memory is allocated on the device, a zeroed 2D array is copied to that allocation directly, and the result of the kernel (filling the diagonals with 1) copied back to the host and printed. It isn't necessary to flatten or otherwise modify the shape or memory layout of the 2D numpy data at any point in the process. The result is:
$ cuda-memcheck python ./gpuarray.py
========= CUDA-MEMCHECK
[[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
========= ERROR SUMMARY: 0 errors