可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Why does the transposed matrix look differently, when converted to a pycuda.gpuarray?

Can you reproduce this? What could cause this? Am I using the wrong approach?

Example code

from pycuda import gpuarray
import pycuda.autoinit
import numpy

data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T

Output

data
[[ 0.70442784  0.08845157 -0.84840715 -1.81618035]
 [ 0.55292499  0.54911566  0.54672164  0.05098847]]
data_gpu.get()
[[ 0.70442784  0.08845157]
 [-0.84840715 -1.81618035]
 [ 0.55292499  0.54911566]
 [ 0.54672164  0.05098847]]
data.T
[[ 0.70442784  0.55292499]
 [ 0.08845157  0.54911566]
 [-0.84840715  0.54672164]
 [-1.81618035  0.05098847]]

回答1:

The basic reason is that numpy transpose only creates a view, which has no effect on the underlying array storage, and it is that storage which PyCUDA directly accesses when a copy is performed to device memory. The solution is to use the copy method when doing the transpose, which will create an array with data in the transposed order in host memory, then copy that to the device:

data_gpu = gpuarray.to_gpu(data.T.copy())

回答2:

In numpy, data.T doesn't do anything to the underlying 1D array. It simply manipulates the strides to obtain the transpose. This makes it a constant-time and constant-memory operation.

It would appear that pycuda.to_gpu() isn't respecting the strides and is simply copying the underlying 1D array. This would produce the exact behaviour you're observing.

In my view there is nothing wrong with your code. Rather, I would consider this a bug in pycuda.

I've googled around, and have found a thread that discusses this issue in detail.

As a workaround, you could try passing numpy.ascontiguousarray(data.T) to gpuarray.to_gpu(). This will, of course, create a second copy of the data in the host RAM.