I made a simple CUDA program for practice. It simply copies over data from one array to another:
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule
# Global constants
N = 2**20 # size of array a
a = np.linspace(0, 1, N)
e = np.empty_like(a)
block_size_x = 512
# Instantiate block and grid sizes.
block_size = (block_size_x, 1, 1)
grid_size = (N / block_size_x, 1)
# Create the CUDA kernel, and run it.
mod = SourceModule("""
__global__ void D2x_kernel(double* a, double* e, int N) {
int tid = blockDim.x * blockIdx.x + threadIdx.x;
if (tid > 0 && tid < N - 1) {
e[tid] = a[tid];
}
}
""")
func = mod.get_function('D2x_kernel')
func(a, cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)
print str(e)
However, I get this error: pycuda._driver.LogicError: cuLaunchKernel failed: invalid value
When I get rid of the second argument double* e
in my kernel function and invoke the kernel without the argument e
, the error goes away. Why is that? What does this error mean?
Your
a
array does not exist in device memory, so I suspect that PyCUDA is ignoring (or otherwise handling) the first argument to your kernel invocation and only passing ine
andN
...so you get an error because the kernel was expecting three arguments and it has only received two. Removingdouble* e
from your kernel definition might eliminate the error message you're getting, but your kernel still won't work properly.A quick fix to this should be to wrap
a
in acuda.In()
call, which instructs PyCUDA to copya
to the device before launching the kernel. That is, your kernel launch line should be:Edit: Also, do you realize that your kernel is not copying the first and last elements of
a
toe
? Yourif (tid > 0 && tid < N - 1)
statement is preventing that. For the entire array, it should beif (tid < N)
.