The current MATLAB based C implementation takes around 6ms
for solving Ax=B
, where A
is banded sparse matrix with band-width 3
of dimensions 780 X 780
.
Now I am looking to use cuBLAS
/cuSPARSE
to find a faster solution.
I need to solve 1440
of such equations in a loop.
I tried using PCG based method but that is very slow and the output is not matching.
Is there any direct solution using cuBLAS
/cuSPARSE
for solving Ax=B
?
This is a fully worked example on how using LU factorization to solve sparse linear systems in CUDA.
If the problem can be converted into a tri-diagonal problem, you can use cusparseXgtsvStridedBatch to the multiple problems without using a for loop. You will have to use cusparse_v2.h instead of cusparse.h for this to work.
If the problem can not be converted into a tri-diagonal problem, you can use routines from CULA to solve your problem. More information regarding this can be read in their blog post. However this is a commercial library. It may also not be best suited for a band of matrix with 3 bands only.