NumPy proposes a way to get the index of the maximum value of an array via np.argmax
.
I would like a similar thing, but returning the indexes of the N maximum values.
For instance, if I have an array, [1, 3, 2, 4, 5]
, function(array, n=3)
would return [4, 3, 1]
.
This will be faster than a full sort depending on the size of your original array and the size of your selection:
It, of course, involves tampering with your original array. Which you could fix (if needed) by making a copy or replacing back the original values. ...whichever is cheaper for your use case.
Use:
Now the
result
list would contain N tuples (index
,value
) wherevalue
is maximized.I think the most time efficiency way is manually iterate through the array and keep a k-size min-heap, as other people have mentioned.
And I also come up with a brute force approach:
Set the largest element to a large negative value after you use argmax to get its index. And then the next call of argmax will return the second largest element. And you can log the original value of these elements and recover them if you want.
If you don't care about the order of the K-th largest elements you can use
argpartition
, which should perform better than a full sort throughargsort
.Credits go to this question.
I ran a few tests and it looks like
argpartition
outperformsargsort
as the size of the array and the value of K increase.Newer NumPy versions (1.8 and up) have a function called
argpartition
for this. To get the indices of the four largest elements, doUnlike
argsort
, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluatinga[ind]
. If you need that too, sort them afterwards:To get the top-k elements in sorted order in this way takes O(n + k log k) time.
The simplest I've been able to come up with is:
This involves a complete sort of the array. I wonder if
numpy
provides a built-in way to do a partial sort; so far I haven't been able to find one.If this solution turns out to be too slow (especially for small
n
), it may be worth looking at coding something up in Cython.