Given a self-indexing (not sure if this is the correct term) numpy array, for example:
a = np.array([3, 2, 0, 1])
This represents this permutation (=>
is an arrow):
0 => 3
1 => 2
2 => 0
3 => 1
I'm trying to make an array representing the inverse transformation without doing it "manually" in python, that is, I want a pure numpy solution. The result I want in the above case is:
array([2, 3, 1, 0])
Which is equivalent to
0 <= 3 0 => 2
1 <= 2 or 1 => 3
2 <= 0 2 => 1
3 <= 1 3 => 0
It seems so simple, but I just can't think of how to do it. I've tried googling, but haven't found anything relevant.
The inverse of a permutation p
of np.arange(n)
is the array of indices s
that sort p
, i.e.
p[s] == np.arange(n)
must be all true. Such an s
is exactly what np.argsort
returns:
>>> p = np.array([3, 2, 0, 1])
>>> np.argsort(p)
array([2, 3, 1, 0])
>>> p[np.argsort(p)]
array([0, 1, 2, 3])
Sorting is an overkill here. This is just a single-pass, linear time algorithm with constant memory requirement:
from __future__ import print_function
import numpy as np
p = np.array([3, 2, 0, 1])
s = np.empty(p.size, dtype=np.int32)
for i in np.arange(p.size):
s[p[i]] = i
print('s =', s)
The above code prints
s = [2 3 1 0]
as required.
The rest of the answer is concerned with the efficient vectorization of the above for
loop. If you just want to know the solution, jump to the end of this answer.
(The original answer from Aug 27, 2014; the timings are valid for NumPy 1.8. An update with NumPy 1.11 follows later.)
A single-pass, linear time algorithm is expected to be faster than np.argsort
; interestingly, the trivial vectorization (s[p] = xrange(p.size)
, see index arrays) of the above for
loop is actually slightly slower than np.argsort
as long as p.size < 700 000
(well, on my machine, your mileage will vary):
import numpy as np
def np_argsort(p):
return np.argsort(p)
def np_fancy(p):
s = np.zeros(p.size, p.dtype) # np.zeros is better than np.empty here, at least on Linux
s[p] = xrange(p.size)
return s
def create_input(n):
np.random.seed(31)
indices = np.arange(n, dtype = np.int32)
return np.random.permutation(indices)
From my IPython notebook:
p = create_input(700000)
%timeit np_argsort(p)
10 loops, best of 3: 72.7 ms per loop
%timeit np_fancy(p)
10 loops, best of 3: 70.2 ms per loop
Eventually the asymptotic complexity kicks in (O(n log n)
for argsort
vs. O(n)
for the single-pass algorithm) and the single-pass algorithm will be consistently faster after a sufficiently large n = p.size
(threshold is around 700k on my machine).
However, there is a less straightforward way to vectorize the above for
loop with np.put
:
def np_put(p):
n = p.size
s = np.zeros(n, dtype = np.int32)
i = np.arange(n, dtype = np.int32)
np.put(s, p, i) # s[p[i]] = i
return s
Which gives for n = 700 000
(the same size as above):
p = create_input(700000)
%timeit np_put(p)
100 loops, best of 3: 12.8 ms per loop
This is a nice 5.6x speed up for next to nothing!
To be fair, np.argsort
still beats the np.put
approach for smaller n
(the tipping point is around n = 1210
on my machine):
p = create_input(1210)
%timeit np_argsort(p)
10000 loops, best of 3: 25.1 µs per loop
%timeit np_fancy(p)
10000 loops, best of 3: 118 µs per loop
%timeit np_put(p)
10000 loops, best of 3: 25 µs per loop
This is most likely because we allocate and fill in an extra array (at the np.arange()
call) with the np_put
approach.
Although you didn't ask for a Cython solution, just out of curiosity, I also timed the following Cython solution with typed memoryviews:
import numpy as np
cimport numpy as np
def in_cython(np.ndarray[np.int32_t] p):
cdef int i
cdef int[:] pmv
cdef int[:] smv
pmv = p
s = np.empty(p.size, dtype=np.int32)
smv = s
for i in xrange(p.size):
smv[pmv[i]] = i
return s
Timings:
p = create_input(700000)
%timeit in_cython(p)
100 loops, best of 3: 2.59 ms per loop
So, the np.put
solution is still not as fast as possible (ran 12.8 ms for this input size; argsort took 72.7 ms).
Update on Feb 3, 2017 with NumPy 1.11
Jamie, Andris and Paul pointed out in comments below that the performance issue with fancy indexing was resolved. Jamie says it was already resolved in NumPy 1.9. I tested it with Python 3.5 and NumPy 1.11 on the machine that I was using back in 2014.
def invert_permutation(p):
s = np.empty(p.size, p.dtype)
s[p] = np.arange(p.size)
return s
Timings:
p = create_input(880)
%timeit np_argsort(p)
100000 loops, best of 3: 11.6 µs per loop
%timeit invert_permutation(p)
100000 loops, best of 3: 11.5 µs per loop
A significant improvement indeed!
Conclusion
All in all, I would go with the
def invert_permutation(p):
'''The argument p is assumed to be some permutation of 0, 1, ..., len(p)-1.
Returns an array s, where s[i] gives the index of i in p.
'''
s = np.empty(p.size, p.dtype)
s[p] = np.arange(p.size)
return s
approach for code clarity. In my opinion, it is less obscure than argsort
, and also faster for large input sizes. If speed becomes an issue, I would go with the Cython solution.
I'd like to offer a tiny bit more background to larsmans correct answer. The reason why argsort
is correct can be found when you use the representation of a permutation by a matrix. The mathematical advantage to a permutation matrix P
is that the matrix "operates on vectors", i.e. a permutation matrix times a vector permutes the vector.
Your permutation looks like:
import numpy as np
a = np.array([3,2,0,1])
N = a.size
rows = np.arange(N)
P = np.zeros((N,N),dtype=int)
P[rows,a] = 1
[[0 0 0 1]
[0 0 1 0]
[1 0 0 0]
[0 1 0 0]]
Given a permutation matrix, we can "undo" multipication by multiplying by it's inverse P^-1
. The beauty of permutation matrices is that they are orthogonal, hence P*P^(-1)=I
, or in other words P(-1)=P^T
, the inverse is the transpose. This means we can take the indices of the transpose matrix to find your inverted permutation vector:
inv_a = np.where(P.T)[1]
[2 3 1 0]
Which if you think about it, is exactly the same as finding the indices that sort the columns of P
!