How to properly pass a scipy.sparse CSR matrix to

2020-02-09 05:35发布

问题:

I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?

回答1:

Here is an example about how to quickly access the data from a coo_matrix using the properties row, col and data. The purpose of the example is just to show how to declare the data types and create the buffers (also adding the compiler directives that will usually give you a considerable boost)...

#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
#cython: nonecheck=False

import numpy as np
from scipy.sparse import coo_matrix
cimport numpy as np

ctypedef np.int32_t cINT32
ctypedef np.double_t cDOUBLE

def print_sparse(m):
    cdef np.ndarray[cINT, ndim=1] row, col
    cdef np.ndarray[cDOUBLE, ndim=1] data
    cdef int i
    if not isinstance(m, coo_matrix):
        m = coo_matrix(m)
    row = m.row.astype(np.int32)
    col = m.col.astype(np.int32)
    data = m.data.astype(np.float64)
    for i in range(np.shape(data)[0]):
        print row[i], col[i], data[i]


回答2:

Building on @SaulloCastro's answer, add this function to the .pyx file to display the attributes of a csr matrix:

def print_csr(m):
    cdef np.ndarray[cINT32, ndim=1] indices, indptr
    cdef np.ndarray[cDOUBLE, ndim=1] data
    cdef int i
    if not isinstance(m, csr_matrix):
        m = csr_matrix(m)
    indices = m.indices.astype(np.int32)
    indptr = m.indptr.astype(np.int32)
    data = m.data.astype(np.float64)
    print indptr
    for i in range(np.shape(data)[0]):
        print indices[i], data[i]

indptr does not have the same length as data, so can't be printed in the same loop.

To display the csr data like coo, you can do your own conversion with these iteration lines:

    for i in range(np.shape(indptr)[0]-1):
        for j in range(indptr[i], indptr[i+1]):
            print i, indices[j], data[j]

I assume you know how to setup and compile a pyx file.

Also, what does your cython function assume about the matrix? Does it know about the csr format? The coo format?

Or does your cython function want a regular numpy array? In that case, we are off on a rabbit trail. You just need to convert the sparse matrix to an array: x.toarray() (or x.A for short).



回答3:

If you want to access the data directly (without copy) you need to specify the type in the function argument:

import numpy as np
cimport numpy as np

#cython: boundscheck=False
#cython: wraparound=False
def some_cython_func(np.ndarray[np.double_t] data, np.ndarray[int] indices, np.ndarray[int] indptr):
    #body of of the function

Then you may call this function using

some_cython_func(M.data, M.indices, M.indptr)

where M is your CSR or CSC function.

See this page for an explanation of passing argument without casting.