scipy.sparse default value

2019-02-12 17:26发布

问题:

The sparse matrix format (dok) assumes that values of keys not in the dictionary are equal to zero. Is there any way to make it use a default value other than zero?

Also, is there a way to calculate the log of a sparse matrix (akin to np.log in regular numpy matrix)

回答1:

That feature is not built-in, but if you really need this, you should be able to write your own dok_matrix class, or subclass Scipy's one. The Scipy implementation is here: https://github.com/scipy/scipy/blob/master/scipy/sparse/dok.py At least in the places where dict.* calls are made, the default value needs to be changed --- and maybe there are some other changes that need to be made.

However, I'd try to reformulate the problem so that this is not needed. If you for instance do linear algebra, you can isolate the constant term, and do instead

from scipy.sparse.linalg import LinearOperator
A = whatever_dok_matrix_minus_constant_term
def my_matvec(x):
    return A*x + constant_term * x.sum()
op = LinearOperator(A.shape, matvec=my_matvec)

To most linear algebra routines (e.g. iterative solvers), you can pass in op instead of A.

As to the matrix logarithm: logarithm of a sparse matrix (as in scipy.linalg.logm) is typically dense, so you should just convert the matrix to a dense one first, and then compute the logarithm as usual. As far as I see, using a sparse matrix would give no performance gain. If you need only to compute a product of a vector and the logarithm, log(A) * v vector, some Krylov method might help, though.

If you OTOH want to compute the logarithm elementwise, you can modify the .data attribute directly (available at least in COO, CSR, and CSC)

x = A.tocoo()
x.data = np.log(x.data)
A = x.todok()

This leaves the zero elements alone, but as above, this allows treating the constant part separately.