Efficiently test matrix rows and columns with nump

2020-04-29 16:01发布

问题:

I am trying to remove both the row i and column i when both the row i and column i contains all 0s. For example in this case we can see that row 0 is all zeros and column 0 is all zeros and thus row and column 0 is removed. Same with row column pair 2 and 4. Row 1 is all zeros but column 1 is not so neither are removed.

[0,0,0,0,0]
[0,1,0,1,0]
[0,0,0,0,0]
[0,0,0,0,0]
[0,0,0,0,0]

would become

[1,1]
[0,0]

Another example:

[0,0,1,0,0,1]
[0,0,0,0,0,0]
[0,0,0,0,0,0]
[0,0,0,0,0,0]
[0,0,0,0,0,0]
[0,0,1,0,1,0]

would change to:

[0,1,0,1]
[0,0,0,0]
[0,0,0,0]
[0,1,1,0]

This is the code that I am using to compute:

def remove(matrix):
    for i, x in reversed(list(enumerate(matrix))):
        if np.all(matrix == 0, axis=0)[i] and np.all(matrix == 0, axis=1)[i]:
            matrix = np.delete(matrix,i,axis=0)
            matrix = np.delete(matrix,i,axis=1)
    return matrix

After testing this line is taking the most time by far:

if np.all(matrix == 0, axis=0)[i] and np.all(matrix == 0, axis=1)[i]:

Is there a more appropriate way to test a row and column in this way? The matrix that I am using is a sparse binary matrix. I am not using any sparse matrix classes just ndarray.

回答1:

Vectorized approach with masking -

def remove_vectorized(a):
    mask = a==0
    m_row = ~mask.all(1)
    m_col = ~mask.all(0)
    comb_mask = m_row | m_col
    return a[comb_mask][:,comb_mask] #or a[np.ix_(comb_mask, comb_mask)]

Sample runs

Case #1 :

In [485]: a
Out[485]: 
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [486]: remove_vectorized(a)
Out[486]: 
array([[1, 1],
       [0, 0]])

Case #2 :

In [489]: a
Out[489]: 
array([[0, 0, 1, 0, 0, 1],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 1, 0]])

In [490]: remove_vectorized(a)
Out[490]: 
array([[0, 1, 0, 1],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 1, 1, 0]])


回答2:

One thing you can do- evaluate the boolean matrix "matrix == 0" outside of your for loop, and use that matrix instead of reevaluating it every run.

It'd be something like:

def remove(matrix):
    binary_matrix = matrix == 0
    for i, x in reversed(list(enumerate(matrix))):
        if np.all(binary_matrix , axis=0)[i] and np.all(binary_matrix , axis=1)[i]:
            matrix = np.delete(matrix,i,axis=0)
            matrix = np.delete(matrix,i,axis=1)
            binary_matrix = np.delete(binary_matrix ,i,axis=0)
            binary_matrix = np.delete(binary_matrix ,i,axis=1)
    return matrix