I'm trying to make a pycuda wrapper inspired by scikits-cuda library for some operations provided in the new cuSolver library of Nvidia. I want to solve a linear system of the form AX=B by LU factorization, to perform that first use the cublasSgetrfBatched method from scikits-cuda, that give me the factorization LU; then with that factorization I want to solve the system using cusolverDnSgetrs from cuSolve that I want to wrap, when I perform the computation return status 3, the matrices that supose to give me the answer don't change, BUT the *devInfo is zero, looking in the cusolver's documentation says:
CUSOLVER_STATUS_INVALID_VALUE=An unsupported value or parameter was passed to the function (a negative vector size, for example).
libcusolver.cusolverDnSgetrs.restype=int
libcusolver.cusolverDnSgetrs.argtypes=[_types.handle,
ctypes.c_char,
ctypes.c_int,
ctypes.c_int,
ctypes.c_void_p,
ctypes.c_int,
ctypes.c_void_p,
ctypes.c_void_p,
ctypes.c_int,
ctypes.c_void_p]
"""
handle is the handle pointer given by calling cusolverDnCreate() from cuSolver
LU is the LU factoriced matrix given by cublasSgetrfBatched() from scikits
P is the pivots matrix given by cublasSgetrfBatched()
B is the right hand matix from AX=B
"""
def cusolverSolveLU(handle,LU,P,B):
rows_LU ,cols_LU = LU.shape
rows_B, cols_B = B.shape
B_gpu = gpuarray.to_gpu(B.astype('float32'))
info_gpu = gpuarray.zeros(1, np.int32)
status=libcusolver.cusolverDnSgetrs(
handle, 'n', rows_LU, cols_B,
int(LU.gpudata), cols_LU,
int(P.gpudata), int(B_gpu.gpudata),
cols_B, int(info_gpu.gpudata))
print info_gpu
print status
handle= cusolverCreate() #get the initialization of cusolver
LU, P = cublasLUFactorization(...)
B = np.asarray(np.random.rand(3, 3), np.float32)
cusolverSolveLU(handle,LU,P,B)
The output:
[0]
3
What I'm doing wrong?
Here is a full working example of how to use the library; the result is validated against that obtained using numpy's built-in solver: