Given a self-indexing (not sure if this is the correct term) numpy array, for example:
a = np.array([3, 2, 0, 1])
This represents this permutation (=>
is an arrow):
0 => 3
1 => 2
2 => 0
3 => 1
I'm trying to make an array representing the inverse transformation without doing it "manually" in python, that is, I want a pure numpy solution. The result I want in the above case is:
array([2, 3, 1, 0])
Which is equivalent to
0 <= 3 0 => 2
1 <= 2 or 1 => 3
2 <= 0 2 => 1
3 <= 1 3 => 0
It seems so simple, but I just can't think of how to do it. I've tried googling, but haven't found anything relevant.
I'd like to offer a tiny bit more background to larsmans correct answer. The reason why
argsort
is correct can be found when you use the representation of a permutation by a matrix. The mathematical advantage to a permutation matrixP
is that the matrix "operates on vectors", i.e. a permutation matrix times a vector permutes the vector.Your permutation looks like:
Given a permutation matrix, we can "undo" multipication by multiplying by it's inverse
P^-1
. The beauty of permutation matrices is that they are orthogonal, henceP*P^(-1)=I
, or in other wordsP(-1)=P^T
, the inverse is the transpose. This means we can take the indices of the transpose matrix to find your inverted permutation vector:Which if you think about it, is exactly the same as finding the indices that sort the columns of
P
!Sorting is an overkill here. This is just a single-pass, linear time algorithm with constant memory requirement:
The above code prints
as required.
The rest of the answer is concerned with the efficient vectorization of the above
for
loop. If you just want to know the solution, jump to the end of this answer.(The original answer from Aug 27, 2014; the timings are valid for NumPy 1.8. An update with NumPy 1.11 follows later.)
A single-pass, linear time algorithm is expected to be faster than
np.argsort
; interestingly, the trivial vectorization (s[p] = xrange(p.size)
, see index arrays) of the abovefor
loop is actually slightly slower thannp.argsort
as long asp.size < 700 000
(well, on my machine, your mileage will vary):From my IPython notebook:
Eventually the asymptotic complexity kicks in (
O(n log n)
forargsort
vs.O(n)
for the single-pass algorithm) and the single-pass algorithm will be consistently faster after a sufficiently largen = p.size
(threshold is around 700k on my machine).However, there is a less straightforward way to vectorize the above
for
loop withnp.put
:Which gives for
n = 700 000
(the same size as above):This is a nice 5.6x speed up for next to nothing!
To be fair,
np.argsort
still beats thenp.put
approach for smallern
(the tipping point is aroundn = 1210
on my machine):This is most likely because we allocate and fill in an extra array (at the
np.arange()
call) with thenp_put
approach.Although you didn't ask for a Cython solution, just out of curiosity, I also timed the following Cython solution with typed memoryviews:
Timings:
So, the
np.put
solution is still not as fast as possible (ran 12.8 ms for this input size; argsort took 72.7 ms).Update on Feb 3, 2017 with NumPy 1.11
Jamie, Andris and Paul pointed out in comments below that the performance issue with fancy indexing was resolved. Jamie says it was already resolved in NumPy 1.9. I tested it with Python 3.5 and NumPy 1.11 on the machine that I was using back in 2014.
Timings:
A significant improvement indeed!
Conclusion
All in all, I would go with the
approach for code clarity. In my opinion, it is less obscure than
argsort
, and also faster for large input sizes. If speed becomes an issue, I would go with the Cython solution.The inverse of a permutation
p
ofnp.arange(n)
is the array of indicess
that sortp
, i.e.must be all true. Such an
s
is exactly whatnp.argsort
returns: