I have a 2-dimensional NumPy array, for example:
array([[1, 1, 0, 2, 2],
[1, 1, 0, 2, 0],
[0, 0, 0, 0, 0],
[3, 3, 0, 4, 4],
[3, 3, 0, 4, 4]])
I would like to get all elements from that array which are in a certain list, for example (1, 3, 4). The desired result in the example case would be:
array([[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[3, 3, 0, 4, 4],
[3, 3, 0, 4, 4]])
I know that I can just do (as recommended here Numpy: find elements within range):
np.logical_or(
np.logical_or(cc_labeled == 1, cc_labeled == 3),
cc_labeled == 4
)
, but this will be only reasonably effective in the example case. In reality iteratively using for loop and numpy.logical_or turned out to be really slow since the list of possible values is in thousands (and numpy array has approximately the dimension of 1000 x 1000).
You can use np.in1d
-
A*np.in1d(A,[1,3,4]).reshape(A.shape)
Also, np.where
could be used -
np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)
You can also use np.searchsorted
to find such matches by using its optional 'side'
argument with inputs as left
and right
and noting that for the matches, the searchsorted would output different results with these two inputs. Thus, an equivalent of np.in1d(A,[1,3,4])
would be -
M = np.searchsorted([1,3,4],A.ravel(),'left') != \
np.searchsorted([1,3,4],A.ravel(),'right')
Thus, the final output would be -
out = A*M.reshape(A.shape)
Please note that if the input search list is not sorted, you need to use the optional argumentsorter
with its argsort
indices in np.searchsorted
.
Sample run -
In [321]: A
Out[321]:
array([[1, 1, 0, 2, 2],
[1, 1, 0, 2, 0],
[0, 0, 0, 0, 0],
[3, 3, 0, 4, 4],
[3, 3, 0, 4, 4]])
In [322]: A*np.in1d(A,[1,3,4]).reshape(A.shape)
Out[322]:
array([[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[3, 3, 0, 4, 4],
[3, 3, 0, 4, 4]])
In [323]: np.where(np.in1d(A,[1,3,4]).reshape(A.shape),A,0)
Out[323]:
array([[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[3, 3, 0, 4, 4],
[3, 3, 0, 4, 4]])
In [324]: M = np.searchsorted([1,3,4],A.ravel(),'left') != \
...: np.searchsorted([1,3,4],A.ravel(),'right')
...: A*M.reshape(A.shape)
...:
Out[324]:
array([[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[3, 3, 0, 4, 4],
[3, 3, 0, 4, 4]])
Runtime tests and verify outputs -
In [309]: # Inputs
...: A = np.random.randint(0,1000,(400,500))
...: lst = np.sort(np.random.randint(0,1000,(100))).tolist()
...:
...: def func1(A,lst):
...: return A*np.in1d(A,lst).reshape(A.shape)
...:
...: def func2(A,lst):
...: return np.where(np.in1d(A,lst).reshape(A.shape),A,0)
...:
...: def func3(A,lst):
...: mask = np.searchsorted(lst,A.ravel(),'left') != \
...: np.searchsorted(lst,A.ravel(),'right')
...: return A*mask.reshape(A.shape)
...:
In [310]: np.allclose(func1(A,lst),func2(A,lst))
Out[310]: True
In [311]: np.allclose(func1(A,lst),func3(A,lst))
Out[311]: True
In [312]: %timeit func1(A,lst)
10 loops, best of 3: 30.9 ms per loop
In [313]: %timeit func2(A,lst)
10 loops, best of 3: 30.9 ms per loop
In [314]: %timeit func3(A,lst)
10 loops, best of 3: 28.6 ms per loop