How do I get indices of N maximum values in a NumP

2018-12-31 14:30发布

问题:

NumPy proposes a way to get the index of the maximum value of an array via np.argmax.

I would like a similar thing, but returning the indexes of the N maximum values.

For instance, if I have an array, [1, 3, 2, 4, 5], function(array, n=3) would return [4, 3, 1].

回答1:

The simplest I\'ve been able to come up with is:

In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

This involves a complete sort of the array. I wonder if numpy provides a built-in way to do a partial sort; so far I haven\'t been able to find one.

If this solution turns out to be too slow (especially for small n), it may be worth looking at coding something up in Cython.



回答2:

Newer NumPy versions (1.8 and up) have a function called argpartition for this. To get the indices of the four largest elements, do

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])

Unlike argsort, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]. If you need that too, sort them afterwards:

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

To get the top-k elements in sorted order in this way takes O(n + k log k) time.



回答3:

Simpler yet:

idx = (-arr).argsort()[:n]

where n is the number of maximum values.



回答4:

Use:

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

For regular Python lists:

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

If you use Python 2, use xrange instead of range.

Source: heapq — Heap queue algorithm



回答5:

If you happen to be working with a multidimensional array then you\'ll need to flatten and unravel the indices:

def largest_indices(ary, n):
    \"\"\"Returns the n largest indices from a numpy array.\"\"\"
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

For example:

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])


回答6:

If you don\'t care about the order of the K-th largest elements you can use argpartition, which should perform better than a full sort through argsort.

K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])

Credits go to this question.

I ran a few tests and it looks like argpartition outperforms argsort as the size of the array and the value of K increase.



回答7:

For multidimensional arrays you can use the axis keyword in order to apply the partitioning along the expected axis.

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

And for grabbing the items:

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

But note that this won\'t return a sorted result. In that case you can use np.argsort() along the intended axis:

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

Here is an example:

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])


回答8:

This will be faster than a full sort depending on the size of your original array and the size of your selection:

>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])

It, of course, involves tampering with your original array. Which you could fix (if needed) by making a copy or replacing back the original values. ...whichever is cheaper for your use case.



回答9:

bottleneck has a partial sort function, if the expense of sorting the entire array just to get the N largest values is too great.

I know nothing about this module; I just googled numpy partial sort.



回答10:

Use:

from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

Now the result list would contain N tuples (index, value) where value is maximized.



回答11:

Method np.argpartition only returns the k largest indices, performs a local sort, and is faster than np.argsort(performing a full sort) when array is quite large. But the returned indices are NOT in ascending/descending order. Let\'s say with an example:

\"Enter

We can see that if you want a strict ascending order top k indices, np.argpartition won\'t return what you want.

Apart from doing a sort manually after np.argpartition, my solution is to use PyTorch, torch.topk, a tool for neural network construction, providing NumPy-like APIs with both CPU and GPU support. It\'s as fast as NumPy with MKL, and offers a GPU boost if you need large matrix/vector calculations.

Strict ascend/descend top k indices code will be:

\"Enter

Note that torch.topk accepts a torch tensor, and returns both top k values and top k indices in type torch.Tensor. Similar with np, torch.topk also accepts an axis argument so that you can handle multi-dimensional arrays/tensors.



回答12:

Use:

def max_indices(arr, k):
    \'\'\'
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    \'\'\'
    assert k <= arr.size, \'k should be smaller or equal to the array size\'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

It also works with 2D arrays. For example,

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])


回答13:

The following is a very easy way to see the maximum elements and its positions. Here axis is the domain; axis = 0 means column wise maximum number and axis = 1 means row wise max number for the 2D case. And for higher dimensions it depends upon you.

M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))


回答14:

I found it most intuitive to use np.unique.

The idea is, that the unique method returns the indices of the input values. Then from the max unique value and the indicies, the position of the original values can be recreated.

multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]


回答15:

I think the most time efficiency way is manually iterate through the array and keep a k-size min-heap, as other people have mentioned.

And I also come up with a brute force approach:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float(\'inf\')

Set the largest element to a large negative value after you use argmax to get its index. And then the next call of argmax will return the second largest element. And you can log the original value of these elements and recover them if you want.



标签: python numpy