Numpy: Sorting a multidimensional array by a multi

2019-01-18 12:19发布

问题:

Forgive me if this is redundant or super basic. I'm coming to Python/Numpy from R and having a hard time flipping things around in my head.

I have a n dimensional array which I want to sort using another n dimensional array of index values. I know I could wrap this in a loop but it seems like there should be a really concise Numpyonic way of beating this into submission. Here's my example code to set up the problem where n=2:

a1 = random.standard_normal(size=[2,5]) 
index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) 

so now I have a 2 x 5 array of random numbers and a 2 x 5 index. I've read the help for take() about 10 times now but my brain is not groking it, obviously.

I thought this might get me there:

take(a1, index)

array([[ 0.29589188, -0.71279375, -0.18154864, -1.12184984,  0.25698875],
       [ 0.29589188, -0.71279375, -0.18154864,  0.25698875, -1.12184984]])

but that's clearly reordering only the first element (I presume because of flattening).

Any tips on how I get from where I am to a solution that sorts element 0 of a1 by element 0 of the index ... element n?

回答1:

I can't think of how to work this in N dimensions yet, but here is the 2D version:

>>> a = np.random.standard_normal(size=(2,5))
>>> a
array([[ 0.72322499, -0.05376714, -0.28316358,  1.43025844, -0.90814293],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])
>>> i = np.array([[0,1,2,4,3],[0,1,2,3,4]]) 
>>> a[np.arange(a.shape[0])[:,np.newaxis],i]
array([[ 0.72322499, -0.05376714, -0.28316358, -0.90814293,  1.43025844],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])

Here is the N-dimensional version:

>>> a[list(np.ogrid[[slice(x) for x in a.shape]][:-1])+[i]]

Here's how it works:

Ok, let's start with a 3 dimensional array for illustration.

>>> import numpy as np
>>> a = np.arange(24).reshape((2,3,4))
>>> a
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

You can access elements of this array by specifying the index along each axis as follows:

>>> a[0,1,2]
6

This is equivalent to a[0][1][2] which is how you would access the same element if we were dealing with a list instead of an array.

Numpy allows you to get even fancier when slicing arrays:

>>> a[[0,1],[1,1],[2,2]]
array([ 6, 18])
>>> a[[0,1],[1,2],[2,2]]
array([ 6, 22])

These examples would be equivalent to [a[0][1][2],a[1][1][2]] and [a[0][1][2],a[1][2][2]] if we were dealing with lists.

You can even leave out repeated indices and numpy will figure out what you want. For example, the above examples could be equivalently written:

>>> a[[0,1],1,2]
array([ 6, 18])
>>> a[[0,1],[1,2],2]
array([ 6, 22])

The shape of the array (or list) you slice with in each dimension only affects the shape of the returned array. In other words, numpy doesn't care that you are trying to index your array with an array of shape (2,3,4) when it's pulling values, except that it will feed you back an array of shape (2,3,4). For example:

>>> a[[[0,0],[0,0]],[[0,0],[0,0]],[[0,0],[0,0]]]
array([[0, 0],
       [0, 0]])

In this case, we're grabbing the same element, a[0,0,0] over and over again, but numpy is returning an array with the same shape as we passed in.

Ok, onto your problem. What you want is to index the array along the last axis with the numbers in your index array. So, for the example in your question you would like [[a[0,0],a[0,1],a[0,2],a[0,4],a[0,3]],a[1,0],a[1,1],...

The fact that your index array is multidimensional, like I said earlier, doesn't tell numpy anything about where you want to pull these indices from; it just specifies the shape of the output array. So, in your example, you need to tell numpy that the first 5 values are to be pulled from a[0] and the latter 5 from a[1]. Easy!

>>> a[[[0]*5,[1]*5],index]

It gets complicated in N dimensions, but let's do it for the 3 dimensional array a I defined way above. Suppose we have the following index array:

>>> i = np.array(range(4)[::-1]*6).reshape(a.shape)
>>> i
array([[[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]],

       [[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]]])

So, these values are all for indices along the last axis. We need to tell numpy what indices along the first and second axes these numbers are to be taken from; i.e. we need to tell numpy that the indices for the first axis are:

i1 = [[[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]],

      [[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]]]

and the indices for the second axis are:

i2 = [[[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]],

      [[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]]]

Then we can just do:

>>> a[i1,i2,i]
array([[[ 3,  2,  1,  0],
        [ 7,  6,  5,  4],
        [11, 10,  9,  8]],

       [[15, 14, 13, 12],
        [19, 18, 17, 16],
        [23, 22, 21, 20]]])

The handy numpy function which generates i1 and i2 is called np.mgrid. I use np.ogrid in my answer which is equivalent in this case because of the numpy magic I talked about earlier.

Hope that helps!



回答2:

After playing with this some more today I figured out that if I used a mapper function along with take I could solve the 2 dimensional version really simply like this:

a1 = random.standard_normal(size=[2,5]) 
index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) 
map(take, a1, index)

I needed to map() the take() to each element in a1

Of course, the accepted answer solves the n-dimensional version. However in retrospect I determined that I don't really need the n-dimensional solution, only the 2-D version.