I have a very large numpy array (containing up to a million elements) like the one below:
[ 0 1 6 5 1 2 7 6 2 3 8 7 3 4 9 8 5 6 11 10 6 7 12 11 7
8 13 12 8 9 14 13 10 11 16 15 11 12 17 16 12 13 18 17 13 14 19 18 15 16
21 20 16 17 22 21 17 18 23 22 18 19 24 23]
and a small dictionary map for replacing some of the elements in the above array
{4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0}
I would like to replace some of the elements according to the map above. The numpy array is really large, and only a small subset of the elements (occurring as keys in the dictionary) will be replaced with the corresponding values. What is the fastest way to do this?
I believe there's even more efficient method, but for now, try
Microbenchmark and test for correctness:
Result:
A fully vectorized solution using
np.in1d
andnp.searchsorted
:The numpy_indexed package (disclaimer: I am its author) provides an elegant and efficient vectorized solution to this type of problem:
The method implemented is similar to the searchsorted based approach mentioned by Jean Lescut, but even more general. For instance, the items of the array do not need to be ints, but can be any type, even nd-subarrays themselves; yet it should achieve the same kind of performance.
Pythonic way without the need for data to be integer, can be even strings:
Assuming the values are between 0 and some maximum integer, one could implement a fast replace by using the numpy-array as
int->int
dict, like belowwhere first
and replacing with
we obtain
I benchmarked some solutions, and the result is without appeal :
And I got the following results :
The "searchsort" method is almost a hundred times faster than the "for" loop, and about 3600 times faster than the numpy bruteforce method. The list comprehension method is also a very good trade-off between code simplicity and speed.