I have a NumPy array with integer values. Values of matrix range from 0 to max element in matrix(in other words, all numbers from 0 to max data element presented in it). I need to build effective( effective means fast fully-vectorized solution) for searching number of elements in each row and encode them according to matrix values.
I could not find a similar question, or a question that somehow helped to solve this.
So if i have this data
in input:
# shape is (N0=4, m0=4)
1 1 0 4
2 4 2 1
1 2 3 5
4 4 4 1
desired output is :
# shape(N=N0, m=data.max()+1):
1 2 0 0 1 0
0 1 2 0 1 0
0 1 1 1 0 1
0 1 0 0 3 0
I know how to solve this by simply counting unique values in each row of data
iterating one by one, and then combining results taking in account all possible values in data
array.
While using NumPy for vectorizing this the key problem is that searching each number one by one is slow and assuming that there are a lot of unique numbers presented, this can not be effective solution. Generally both N
and unique numbers count is rather large(by the way, N
seem to be larger than unique numbers count).
Has somebody have great ideas?)
Well that's basically what does
np.bincount
does with1D
arrays. But, we need to use it on each row iteratively (thinking of it simply). To make it vectorized, we could offset each row by that max number. The idea is to have different bins for each row such that they are not affected by other row elements with same numbers.Hence, the implementation would be -
Sample run -
Numba Tweaks
We can bring in
numba
for further speedups. Now,numba
allows few tweaks.First off, it allows JIT compilation.
Also, recently they had introduced experimental
parallel
that automatically parallelizes operations in the function known to have parallel semantics.Final tweak would be to use
prange
as a subsititute forrange
. The docs state that this runs loops in parallel, similar to OpenMP parallel for loops and Cython’s prange.prange
performs well with larger datasets, which probably is because of the overhead needed to setup the parallel work.So, with these new two tweaks along with the
njit
for no-Python mode, we would have three variants -For completeness and testing out later on, the loopy version would be -
Runtime test
Case #1 :
Case #2 :
Case #3 :
Seems like the
numba
variants are performing very well. Choosing one out of the three variants would depend on the input array shape parameters and to some extent on the number of unique elements in it.