How to find all elements in a numpy 2-dimensional

I have a 2-dimensional NumPy array, for example:

array([[1, 1, 0, 2, 2],
       [1, 1, 0, 2, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

I would like to get all elements from that array which are in a certain list, for example (1, 3, 4). The desired result in the example case would be:

array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

I know that I can just do (as recommended here Numpy: find elements within range):

np.logical_or(
    np.logical_or(cc_labeled == 1, cc_labeled == 3),
    cc_labeled == 4
)

, but this will be only reasonably effective in the example case. In reality iteratively using for loop and numpy.logical_or turned out to be really slow since the list of possible values is in thousands (and numpy array has approximately the dimension of 1000 x 1000).

回答1:

You can use np.in1d -

A*np.in1d(A,[1,3,4]).reshape(A.shape)

Also, np.where could be used -

np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)

You can also use np.searchsorted to find such matches by using its optional 'side' argument with inputs as left and right and noting that for the matches, the searchsorted would output different results with these two inputs. Thus, an equivalent of np.in1d(A,[1,3,4]) would be -

M = np.searchsorted([1,3,4],A.ravel(),'left') != \
    np.searchsorted([1,3,4],A.ravel(),'right')

Thus, the final output would be -

out = A*M.reshape(A.shape)

Please note that if the input search list is not sorted, you need to use the optional argumentsorter with its argsort indices in np.searchsorted.

Sample run -

In [321]: A
Out[321]: 
array([[1, 1, 0, 2, 2],
       [1, 1, 0, 2, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [322]: A*np.in1d(A,[1,3,4]).reshape(A.shape)
Out[322]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [323]: np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)
Out[323]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

In [324]: M = np.searchsorted([1,3,4],A.ravel(),'left') != \
     ...:     np.searchsorted([1,3,4],A.ravel(),'right')
     ...: A*M.reshape(A.shape)
     ...: 
Out[324]: 
array([[1, 1, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 3, 0, 4, 4],
       [3, 3, 0, 4, 4]])

Runtime tests and verify outputs -

In [309]: # Inputs
     ...: A = np.random.randint(0,1000,(400,500))
     ...: lst = np.sort(np.random.randint(0,1000,(100))).tolist()
     ...: 
     ...: def func1(A,lst):                         
     ...:   return A*np.in1d(A,lst).reshape(A.shape)
     ...: 
     ...: def func2(A,lst):                         
     ...:   return np.where(np.in1d(A,lst).reshape(A.shape),A,0)
     ...: 
     ...: def func3(A,lst):                         
     ...:   mask = np.searchsorted(lst,A.ravel(),'left') != \
     ...:          np.searchsorted(lst,A.ravel(),'right')
     ...:   return A*mask.reshape(A.shape)
     ...: 

In [310]: np.allclose(func1(A,lst),func2(A,lst))
Out[310]: True

In [311]: np.allclose(func1(A,lst),func3(A,lst))
Out[311]: True

In [312]: %timeit func1(A,lst)
10 loops, best of 3: 30.9 ms per loop

In [313]: %timeit func2(A,lst)
10 loops, best of 3: 30.9 ms per loop

In [314]: %timeit func3(A,lst)
10 loops, best of 3: 28.6 ms per loop

回答2:

Use np.in1d:

np.in1d(arr, [1,3,4]).reshape(arr.shape)

in1d, as the name suggest, operates on the flattened array, therefor you need to reshape after the operation.