I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?
问题:
回答1:
Here is an example about how to quickly access the data from a coo_matrix
using the properties row
, col
and data
. The purpose of the example is just to show how to declare the data types and create the buffers (also adding the compiler directives that will usually give you a considerable boost)...
#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
#cython: nonecheck=False
import numpy as np
from scipy.sparse import coo_matrix
cimport numpy as np
ctypedef np.int32_t cINT32
ctypedef np.double_t cDOUBLE
def print_sparse(m):
cdef np.ndarray[cINT, ndim=1] row, col
cdef np.ndarray[cDOUBLE, ndim=1] data
cdef int i
if not isinstance(m, coo_matrix):
m = coo_matrix(m)
row = m.row.astype(np.int32)
col = m.col.astype(np.int32)
data = m.data.astype(np.float64)
for i in range(np.shape(data)[0]):
print row[i], col[i], data[i]
回答2:
Building on @SaulloCastro's answer, add this function to the .pyx
file to display the attributes of a csr
matrix:
def print_csr(m):
cdef np.ndarray[cINT32, ndim=1] indices, indptr
cdef np.ndarray[cDOUBLE, ndim=1] data
cdef int i
if not isinstance(m, csr_matrix):
m = csr_matrix(m)
indices = m.indices.astype(np.int32)
indptr = m.indptr.astype(np.int32)
data = m.data.astype(np.float64)
print indptr
for i in range(np.shape(data)[0]):
print indices[i], data[i]
indptr
does not have the same length as data
, so can't be printed in the same loop.
To display the csr
data like coo
, you can do your own conversion with these iteration lines:
for i in range(np.shape(indptr)[0]-1):
for j in range(indptr[i], indptr[i+1]):
print i, indices[j], data[j]
I assume you know how to setup and compile a pyx
file.
Also, what does your cython
function assume about the matrix? Does it know about the csr
format? The coo
format?
Or does your cython
function want a regular numpy
array? In that case, we are off on a rabbit trail. You just need to convert the sparse matrix to an array: x.toarray()
(or x.A
for short).
回答3:
If you want to access the data directly (without copy) you need to specify the type in the function argument:
import numpy as np
cimport numpy as np
#cython: boundscheck=False
#cython: wraparound=False
def some_cython_func(np.ndarray[np.double_t] data, np.ndarray[int] indices, np.ndarray[int] indptr):
#body of of the function
Then you may call this function using
some_cython_func(M.data, M.indices, M.indptr)
where M
is your CSR
or CSC
function.
See this page for an explanation of passing argument without casting.