Suppose I have a matrix in the CSR format, what is the most efficient way to set a row (or rows) to zeros?
The following code runs quite slowly:
A = A.tolil()
A[indices, :] = 0
A = A.tocsr()
I had to convert to scipy.sparse.lil_matrix
because the CSR format seems to support neither fancy indexing nor setting values to slices.
I guess scipy just does not implement it, but the CSR format would support this quite well, please read the wikipedia article on "Sparse matrix" about what indptr
, etc. are:
# A.indptr is an array, one for each row (+1 for the nnz):
def csr_row_set_nz_to_val(csr, row, value=0):
"""Set all nonzero elements (elements currently in the sparsity pattern)
to the given value. Useful to set to 0 mostly.
"""
if not isinstance(csr, scipy.sparse.csr_matrix):
raise ValueError('Matrix given must be of CSR format.')
csr.data[csr.indptr[row]:csr.indptr[row+1]] = value
# Now you can just do:
for row in indices:
csr_row_set_nz_to_val(A, row, 0)
# And to remove zeros from the sparsity pattern:
A.eliminate_zeros()
Of course this removes 0s that were set from another place with eliminate_zeros
from the sparsity pattern. If you want to do that (at this point) depends on what you are doing really, ie. elimination might make sense to delay until all other calculations that might add new zero's are done as well, or in some cases you may have 0 values, that you want to change again later, so it would be very bad to eliminate them!
You could in principle of course short-circuit the eliminate_zeros
and prune
, but that should be a lot of hassle, and might be even slower (because you won't do it in C).
Details about eliminiate_zeros (and prune)
The sparse matrix, does generally not save zero elements, but just stores where the nonzero elements are (roughly and with various methods). eliminate_zeros
removes all zeros in your matrix from the sparsity pattern (ie. there is no value stored for that position, when before there was a vlaue stored, but it was 0). Eliminate is bad if you want to change a 0 to a different value lateron, otherwise, it saves space.
Prune would just shrink the data arrays stored when they are longer then necessary. Note that while I first had A.prune()
in there, A.eliminiate_zeros()
already includes prune.
Update to the latest version of scipy. It supports fancy indexing.
You can use matrix dot product to achieve that zeroing. Since the matrix we'll use is very sparse (diagonal with zeros for the rows/columns we which to zero out), multiplication should be efficient.
You'll need one of the following functions:
import scipy.sparse
def zero_rows(M, rows):
diag = scipy.sparse.eye(M.shape[0]).tolil()
for r in rows:
diag[r, r] = 0
return diag.dot(M)
def zero_columns(M, columns):
diag = scipy.sparse.eye(M.shape[1]).tolil()
for c in columns:
diag[c, c] = 0
return M.dot(diag)
Usage example:
>>> A = scipy.sparse.csr_matrix([[1,0,3,4], [5,6,0,8], [9,10,11,0]])
>>> A
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
>>> A.toarray()
array([[ 1, 0, 3, 4],
[ 5, 6, 0, 8],
[ 9, 10, 11, 0]], dtype=int64)
>>> B = zero_rows(A, [1])
>>> B
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
>>> B.toarray()
array([[ 1., 0., 3., 4.],
[ 0., 0., 0., 0.],
[ 9., 10., 11., 0.]])
>>> C = zero_columns(A, [1, 3])
>>> C
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
>>> C.toarray()
array([[ 1., 0., 3., 0.],
[ 5., 0., 0., 0.],
[ 9., 0., 11., 0.]])