Consider the array a
np.random.seed([3,1415])
a = np.random.randint(10, size=(5, 4))
a
array([[0, 2, 7, 3],
[8, 7, 0, 6],
[8, 6, 0, 2],
[0, 4, 9, 7],
[3, 2, 4, 3]])
I can create b
which contains the permutation to sort each column.
b = a.argsort(0)
b
array([[0, 0, 1, 2],
[3, 4, 2, 0],
[4, 3, 4, 4],
[1, 2, 0, 1],
[2, 1, 3, 3]])
I can sort a
with b
a[b, np.arange(a.shape[1])[None, :]]
array([[0, 2, 0, 2],
[0, 2, 0, 3],
[3, 4, 4, 3],
[8, 6, 7, 6],
[8, 7, 9, 7]])
That was the primer to illustrate the output I'm looking for. I want an array b
that has the required permutation for sorting the corresponding column in a
when also considering a lexsort
with another array.
np.random.seed([3,1415])
a = np.random.randint(10, size=(10, 4))
g = np.random.choice(list('abc'), 10)
a
array([[0, 2, 7, 3],
[8, 7, 0, 6],
[8, 6, 0, 2],
[0, 4, 9, 7],
[3, 2, 4, 3],
[3, 6, 7, 7],
[4, 5, 3, 7],
[5, 9, 8, 7],
[6, 4, 7, 6],
[2, 6, 6, 5]])
g
array(['c', 'a', 'c', 'b', 'a', 'a', 'a', 'b', 'c', 'b'],
dtype='<U1')
I want to produce an array b
where each column is the requisite permutation to lexsort
the corresponding column a
. And the lexsort
is from sorting the column first by the groups defined by g
and then by the values in each column in a
.
I can generate the results with:
r = np.column_stack([np.lexsort([a[:, i], g]) for i in range(a.shape[1])])
r
array([[4, 4, 1, 4],
[5, 6, 6, 1],
[6, 5, 4, 5],
[1, 1, 5, 6],
[3, 3, 9, 9],
[9, 9, 7, 3],
[7, 7, 3, 7],
[0, 0, 2, 2],
[8, 8, 0, 0],
[2, 2, 8, 8]])
We can see that this works
g[r]
array([['a', 'a', 'a', 'a'],
['a', 'a', 'a', 'a'],
['a', 'a', 'a', 'a'],
['a', 'a', 'a', 'a'],
['b', 'b', 'b', 'b'],
['b', 'b', 'b', 'b'],
['b', 'b', 'b', 'b'],
['c', 'c', 'c', 'c'],
['c', 'c', 'c', 'c'],
['c', 'c', 'c', 'c']],
dtype='<U1')
and
a[r, np.arange(a.shape[1])[None, :]]
array([[3, 2, 0, 3],
[3, 5, 3, 6],
[4, 6, 4, 7],
[8, 7, 7, 7],
[0, 4, 6, 5],
[2, 6, 8, 7],
[5, 9, 9, 7],
[0, 2, 0, 2],
[6, 4, 7, 3],
[8, 6, 7, 6]])
Question
Is there a way to "broadcast" the use of the grouping array g
for use in every columns lexsort
? What is a more efficient way to do this?
Here's one approach -
The idea is simply that we create a
2D
grid of integer version ofg
array of strings and then offset each column by a barrier that would limit thelexsort
search within each column.Now, on performance, it seems for large datasets,
lexsort
itself would be the bottleneck. For our problem, we are dealing with just two columns. So, we can create our own customlexsort
that scales the second column based on an offset, which is the max limit of number from the first column. The implementation for the same would look something like this -Thus, incorporating this into our proposed method and optimizing the creation of
g_idx2D
, we would have a formal function like so -Runtime test
Original approach :
Timings -
I'm only posting this to have a good place to show my derivative work, based on Divakar's answer. His
lexsort_twocols
function does everything we need and can just as easily be applied to broadcast a single dimension over several others. We can forgo the additional work in inproposed_app
because we can useaxis=0
in theargsort
in thelexsort_twocols
function.I also thought of this... though not nearly as good because I'm still relying on
np.lexsort
which as Divakar pointed out, can be slow.Assuming my first concept is
lexsort1
Thanks again @Divakar. Please upvote his answer!!!