Create mask by first positions only

2019-04-30 02:18发布

问题:

I have array:

a = np.array([[ 0,  1,  2,  0,  0,  0],
              [ 0,  4,  1, 35,  0, 10],
              [ 0,  0,  5,  4,  0,  4],
              [ 1,  2,  5,  4,  0,  4]])

I need select only from first consecutive 0 in each row:

[[  True   False  False  False  False  False]
 [  True   False  False  False  False  False]
 [  True   True   False  False  False  False]
 [  False  False  False  False  False  False]]

I try:

a[np.arange(len(a)), a.argmax(1): np.arange(len(a)), [0,0,0]] = True

But this is wrong.

回答1:

You can use np.cumsum.

Assumption: you are looking for zeros only at the start of each row.

a = np.array([[ 0,  1,  2,  0,  0,  0],
              [ 0,  4,  1, 35,  0, 10],
              [ 0,  0,  5,  4,  0,  4]])

a.cumsum(axis=1) == 0
array([[ True, False, False, False, False, False],
       [ True, False, False, False, False, False],
       [ True,  True, False, False, False, False]], dtype=bool)

Basis: holds True for as long as the cumulative sum is 0 along each row.

Error-prone: an array with negative ints would cause this to fail. I.e. for [-1, 1], this would evaluate to True at position 1.



回答2:

You might use np.minimum.accumulate with the condition testing a == 0(over the rows); Since non zero gives False, so elements come after the first non zero will be set to False due to the accumulated minimum:

np.minimum.accumulate(a == 0, axis=1)
#array([[ True, False, False, False, False, False],
#       [ True, False, False, False, False, False],
#       [ True,  True, False, False, False, False],
#       [False, False, False, False, False, False]], dtype=bool)


回答3:

Here's one with argmin + broadcasting -

(a==0).argmin(1)[:,None] > np.arange(a.shape[1])

Explanation with a sample step-by-step run

1) Input array :

In [207]: a
Out[207]: 
array([[ 0,  1,  2,  0,  0,  0],
       [ 0,  4,  1, 35,  0, 10],
       [ 0,  0,  5,  4,  0,  4],
       [ 1,  2,  5,  4,  0,  4]])

2) Mask of zeros

In [208]: (a==0)
Out[208]: 
array([[ True, False, False,  True,  True,  True],
       [ True, False, False, False,  True, False],
       [ True,  True, False, False,  True, False],
       [False, False, False, False,  True, False]], dtype=bool)

3) Get the indices where the False occurs signalling the end of first True island for each row. Thus, for any row where there is no zero or if the first element is non-zero would result in argmin output as 0. Thus, our next task would be to use broadcasting to create a mask that starts as True from first row and stops being True at those argmin indices. This would be one with broadcasted-comparison against a range array extending covering all columns.

In [209]: (a==0).argmin(1)
Out[209]: array([1, 1, 2, 0])

In [210]: (a==0).argmin(1)[:,None] > np.arange(a.shape[1])
Out[210]: 
array([[ True, False, False, False, False, False],
       [ True, False, False, False, False, False],
       [ True,  True, False, False, False, False],
       [False, False, False, False, False, False]], dtype=bool)

Timings

In [196]: a = np.random.randint(0,9,(5000,5000))

In [197]: %timeit a.cumsum(axis=1) == 0 #@Brad Solomon
     ...: %timeit np.minimum.accumulate(a == 0, axis=1) #@Psidom
     ...: %timeit (a==0).argmin(1)[:,None] > np.arange(a.shape[1])
     ...: 
10 loops, best of 3: 69 ms per loop
10 loops, best of 3: 64.9 ms per loop
10 loops, best of 3: 32.8 ms per loop