Is it possible to call __host__
functions in pyCUDA
like you can __global__
functions? I noticed in the documentation that pycuda.driver.Function
creates a handle to a __global__
function. __device__
functions can be called from a __global__
function, but __host__
code cannot. I'm aware that using a __host__
function pretty much defeats the purpose of pyCUDA
, but there are some already made functions that I'd like to import and call as a proof of concept.
As a note, whenever I try to import the __host__
function, I get:
pycuda._driver.LogicError: cuModuleGetFunction failed: named symbol not found
No it is not possible.
This isn't a limitation of PyCUDA, per se, but of CUDA itself. The __host__
decorator just decays away to plain host code, and the CUDA APIs don't and cannot handle them in the same way that device code can be handled (note the the APIs also don't handle __device__
either, which is the true equivalent of __host__
).
If you want to call/use __host__
functions from Python, you will need to use one of the standard C++/Python interoperability mechanisms, like ctypes or SWIG or boost python, etc.
Below, I'm providing a sample code to call CUDA API
s in pyCUDA
. The code generates uniformly distributed random numbers and may serve as a reference to include already made functions (as the poster says and like CUDA API
s) in a pyCUDA
code.
import numpy as np
import ctypes
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
import pycuda.autoinit
curand = CDLL("/usr/local/cuda/lib64/libcurand.so")
# --- Number of elements to generate
N = 10
# --- cuRAND enums
CURAND_RNG_PSEUDO_DEFAULT = 100
# --- Query the cuRAND version
i = c_ulonglong()
curand.curandGetVersion(byref(i))
print("curand version: ", i.value)
# --- Allocate space for generation
d_x = gpuarray.empty(N, dtype = np.float32)
# --- Create random number generator
gen = c_ulonglong()
curand.curandCreateGenerator(byref(gen), CURAND_RNG_PSEUDO_DEFAULT)
# --- Generate random numbers
curand.curandGenerateUniform(gen, ctypes.cast(d_x.ptr, POINTER(c_float)), N)
print(d_x)