可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
For example I have a coo_matrix A :
import scipy.sparse as sp
A = sp.coo_matrix([3,0,3,0],
[0,0,2,0],
[2,5,1,0],
[0,0,0,0])
How can I get result [0,0,0,1], which indicates that first 3 columns contain non-zero values, only the 4th column is all zeros.
PS : cannot convert A to other type.
PS2 : I tried using np.nonzeros
but it seems that my implementation is not very elegant.
回答1:
Approach #1 We could do something like this -
# Get the columns indices of the input sparse matrix
C = sp.find(A)[1]
# Use np.in1d to create a mask of non-zero columns.
# So, we invert it and convert to int dtype for desired output.
out = (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
Alternatively, to make the code shorter, we can use subtraction -
out = 1-np.in1d(np.arange(A.shape[1]),C)
Step-by-step run -
1) Input array and sparse matrix from it :
In [137]: arr # Regular dense array
Out[137]:
array([[3, 0, 3, 0],
[0, 0, 2, 0],
[2, 5, 1, 0],
[0, 0, 0, 0]])
In [138]: A = sp.coo_matrix(arr) # Convert to sparse matrix as input here on
2) Get non-zero column indices :
In [139]: C = sp.find(A)[1]
In [140]: C
Out[140]: array([0, 2, 2, 0, 1, 2], dtype=int32)
3) Use np.in1d
to get mask of non-zero columns :
In [141]: np.in1d(np.arange(A.shape[1]),C)
Out[141]: array([ True, True, True, False], dtype=bool)
4) Invert it :
In [142]: ~np.in1d(np.arange(A.shape[1]),C)
Out[142]: array([False, False, False, True], dtype=bool)
5) Finally convert to int dtype :
In [143]: (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
Out[143]: array([0, 0, 0, 1])
Alternative subtraction approach :
In [145]: 1-np.in1d(np.arange(A.shape[1]),C)
Out[145]: array([0, 0, 0, 1])
Approach #2 Here's another way and possibly a faster one using matrix-multiplication
-
out = 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)
Runtime test
Let's test out all the posted approaches on a big and really sparse matrix -
In [29]: A = sp.coo_matrix((np.random.rand(4000,4000)>0.998).astype(int))
In [30]: %timeit 1-np.in1d(np.arange(A.shape[1]),sp.find(A)[1])
100 loops, best of 3: 4.12 ms per loop # Approach1
In [31]: %timeit 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)
1000 loops, best of 3: 771 µs per loop # Approach2
In [32]: %timeit 1 - (A.col==np.arange(A.shape[1])[:,None]).any(axis=1)
1 loops, best of 3: 236 ms per loop # @hpaulj's soln
In [33]: %timeit (A!=0).sum(axis=0)==0
1000 loops, best of 3: 1.03 ms per loop # @jez's soln
In [34]: %timeit (np.sum(np.absolute(A.toarray()), 0) == 0) * 1
10 loops, best of 3: 86.4 ms per loop # @wwii's soln
回答2:
The actual logical operation can be performed like this:
b = (A!=0).sum(axis=0)==0
# matrix([[False, False, False, True]], dtype=bool)
Now, to ensure that I'm answering your question exactly, I'd better tell you how you could convert from booleans to integers (although really, for most applications I can think of, you can do a lot more in numpy
and friends if you stick with an array of bool
s):
b = b.astype(int)
#matrix([[0, 0, 0, 1]])
Either way, to then convert from a matrix
to a list
, you could do this:
c = list(b.flat)
# [0, 0, 0, 1]
...although again, I'm not sure this is the best thing to do: for most applications I can imagine, I would perhaps just convert to a one-dimensional numpy.array
with c = b.A.flatten()
instead.
回答3:
Recent
scipy.sparse.coo_matrix how to fast find all zeros column, fill with 1 and normalize
similar, except it wants to fill those columns with 1s and normalize them.
I immediately suggested the lil
format of the transpose. All-0 columns will be empty lists in this format. But sticking with the coo
format I suggested
np.nonzero(~(Mo.col==np.arange(Mo.shape[1])[:,None]).any(axis=1))[0]
or for this 1/0 format
1 - (Mo.col==np.arange(Mo.shape[1])[:,None]).any(axis=1)
which is functionally the same as:
1 - np.in1d(np.arange(Mo.shape[1]),Mo.col)
sparse.find
converts the matrix to csr
to sum duplicates and eliminate duplicates, and then back to coo
to get the data
, row
, and col
attributes (which it returns).
Mo.nonzero
uses A.data != 0
to eliminate 0s before returning the col
and row
attributes.
The np.ones(A.shape[0],dtype=bool)*A.astype(bool)
solution requires converting A
to csr
format for multiplication.
(A!=0).sum(axis=0)
also converts to csr
because column (or row) sum is done with a matrix multiplication.
So the no-convert requirement is unrealistic, at least within the bounds of sparse formats.
===============
For Divakar's test case my ==
version is quite slow; it's ok with small ones, but creates too large of test array with the 1000 columns.
Testing on a matrix that is sparse enough to have a number of 0 columns:
In [183]: Arr=sparse.random(1000,1000,.001)
In [184]: (1-np.in1d(np.arange(Arr.shape[1]),Arr.col)).any()
Out[184]: True
In [185]: (1-np.in1d(np.arange(Arr.shape[1]),Arr.col)).sum()
Out[185]: 367
In [186]: timeit 1-np.ones(Arr.shape[0],dtype=bool)*Arr.astype(bool)
1000 loops, best of 3: 334 µs per loop
In [187]: timeit 1-np.in1d(np.arange(Arr.shape[1]),Arr.col)
1000 loops, best of 3: 323 µs per loop
In [188]: timeit 1-(Arr.col==np.arange(Arr.shape[1])[:,None]).any(axis=1)
100 loops, best of 3: 3.9 ms per loop
In [189]: timeit (Arr!=0).sum(axis=0)==0
1000 loops, best of 3: 820 µs per loop
回答4:
Convert to an array or dense matrix, sum the absolute value along the first axis, test the result against zero, convert to int
>>> import numpy as np
>>> (np.sum(np.absolute(a.toarray()), 0) == 0) * 1
array([0, 0, 0, 1])
>>> (np.sum(np.absolute(a.todense()), 0) == 0) * 1
matrix([[0, 0, 0, 1]])
>>>
>>> np.asarray((np.sum(np.absolute(a.todense()), 0) == 0), dtype = np.int32)
array([[0, 0, 0, 1]])
>>>
The first is the fastest - 24 uS for your example on my machine.
For a matrix made with np.random.randint(0,3,(1000,1000))
, all are right at 25 mS on my machine.