Assume that I have two arrays A
and B
, where both A
and B
are m x n
. My goal is now, for each row of A
and B
, to find where I should insert the elements of row i
of A
in the corresponding row of B
. That is, I wish to apply np.digitize
or np.searchsorted
to each row of A
and B
.
My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?
The solution provided by @Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g.
[[1.0, 2,0, 3.0, 1.0e+20],...]
). In some casesr
may be so large that applyinga+r
andb+r
wipes out the original values you're trying to runsearchsorted
on, and you're just comparingr
tor
.To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.
Edit: The timing on this approach is abysmal!
You would be better off just just using
map
over the array:For integer data, @Divakar's approach is still the fastest:
We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use
np.searchsorted
on flattened version of input arrays thereafter and thus each row fromb
would be restricted to find sorted positions in the corresponding row ina
. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.So, we would have a vectorized implementation like so -
Runtime test -