indices of the k largest elements in an unsorted l

2020-05-18 16:35发布

问题:

I need to find the indices of the k largest elements of an unsorted, length n, array/vector in C++, with k < n. I have seen how to use nth_element() to find the k-th statistic, but I'm not sure if using this is the right choice for my problem as it seems like I would need to make k calls to nth_statistic, which I guess it would have complexity O(kn), which may be as good as it can get? Or is there a way to do this just in O(n)?

Implementing it without nth_element() seems like I will have to iterate over the whole array once, populating a list of indices of the largest elements at each step.

Is there anything in the standard C++ library that makes this a one-liner or any clever way to implement this myself in just a couple lines? In my particular case, k = 3, and n = 6, so efficiency isn't a huge concern, but it would be nice to find a clean and efficient way to do this for arbitrary k and n.

It looks like Mark the top N elements of an unsorted array is probably the closest posting I can find on SO, the postings there are in Python and PHP.

回答1:

Here is my implementation that does what I want and I think is reasonably efficient:

#include <queue>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
  std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};
  std::priority_queue<std::pair<double, int>> q;
  for (int i = 0; i < test.size(); ++i) {
    q.push(std::pair<double, int>(test[i], i));
  }
  int k = 3; // number of indices we need
  for (int i = 0; i < k; ++i) {
    int ki = q.top().second;
    std::cout << "index[" << i << "] = " << ki << std::endl;
    q.pop();
  }
}

which gives output:

index[0] = 3
index[1] = 1
index[2] = 0


回答2:

The question has the partial answer; that is std::nth_element returns the "the n-th statistic" with a property that none of the elements preceding nth one are greater than it, and none of the elements following it are less.

Therefore, just one call to std::nth_element is enough to get the k largest elements. Time complexity will be O(n) which is theoretically the smallest since you have to visit each element at least one time to find the smallest (or in this case k-smallest) element(s). If you need these k elements to be ordered, then you need to order them which will be O(k log(k)). So, in total O(n + k log(k)).



回答3:

This should be an improved version of @hazelnusse which is executed in O(nlogk) instead of O(nlogn)

#include <queue>
#include <iostream>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
  std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};
  std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;
    int k = 5; // number of indices we need
  for (int i = 0; i < test.size(); ++i) {
    if(q.size()<k)
        q.push(std::pair<double, int>(test[i], i));
    else if(q.top().first < test[i]){
        q.pop();
        q.push(std::pair<double, int>(test[i], i));
    }
  }
  k = q.size();
  std::vector<int> res(k);
  for (int i = 0; i < k; ++i) {
    res[k - i - 1] = q.top().second;
    q.pop();
  }
  for (int i = 0; i < k; ++i) {
    std::cout<< res[i] <<std::endl;
  }
}

8 4 1 2 6



回答4:

You can use the basis of the quicksort algorithm to do what you need, except instead of reordering the partitions, you can get rid of the entries falling out of your desired range.

It's been referred to as "quick select" and here is a C++ implementation:

int partition(int* input, int p, int r)
{
    int pivot = input[r];

    while ( p < r )
    {
        while ( input[p] < pivot )
            p++;

        while ( input[r] > pivot )
            r--;

        if ( input[p] == input[r] )
            p++;
        else if ( p < r ) {
            int tmp = input[p];
            input[p] = input[r];
            input[r] = tmp;
        }
    }

    return r;
}

int quick_select(int* input, int p, int r, int k)
{
    if ( p == r ) return input[p];
    int j = partition(input, p, r);
    int length = j - p + 1;
    if ( length == k ) return input[j];
    else if ( k < length ) return quick_select(input, p, j - 1, k);
    else  return quick_select(input, j + 1, r, k - length);
}

int main()
{
    int A1[] = { 100, 400, 300, 500, 200 };
    cout << "1st order element " << quick_select(A1, 0, 4, 1) << endl;
    int A2[] = { 100, 400, 300, 500, 200 };
    cout << "2nd order element " << quick_select(A2, 0, 4, 2) << endl;
    int A3[] = { 100, 400, 300, 500, 200 };
    cout << "3rd order element " << quick_select(A3, 0, 4, 3) << endl;
    int A4[] = { 100, 400, 300, 500, 200 };
    cout << "4th order element " << quick_select(A4, 0, 4, 4) << endl;
    int A5[] = { 100, 400, 300, 500, 200 };
    cout << "5th order element " << quick_select(A5, 0, 4, 5) << endl;
}

OUTPUT:

1st order element 100
2nd order element 200
3rd order element 300
4th order element 400
5th order element 500

EDIT

That particular implementation has an O(n) average run time; due to the method of selection of pivot, it shares quicksort's worst-case run time. By optimizing the pivot choice, your worst case also becomes O(n).



回答5:

The standard library won't get you a list of indices (it has been designed to avoid passing around redundant data). However, if you're interested in n largest elements, use some kind of partitioning (both std::partition and std::nth_element are O(n)):

#include <iostream>
#include <algorithm>
#include <vector>

struct Pred {
    Pred(int nth) : nth(nth) {};
    bool operator()(int k) { return k >= nth; }
    int nth;
};

int main() {

    int n = 4;
    std::vector<int> v = {5, 12, 27, 9, 4, 7, 2, 1, 8, 13, 1};

    // Moves the nth element to the nth from the end position.
    std::nth_element(v.begin(), v.end() - n, v.end());

    // Reorders the range, so that the first n elements would be >= nth.
    std::partition(v.begin(), v.end(), Pred(*(v.end() - n)));

    for (auto it = v.begin(); it != v.end(); ++it)
        std::cout << *it << " ";
    std::cout << "\n";

    return 0;
}


回答6:

You can do this in O(n) time with a single order statistic calculation:

  • Let r be the k-th order statistic
  • Initialize two empty lists bigger and equal.
  • For each index i:
    • If array[i] > r, add i to bigger
    • If array[i] = r, add i to equal
  • Discard elements from equal until the sum of the lengths of the two lists is k
  • Return the concatenation of the two lists.

Naturally, you only need one list if all items are distinct. And if needed, you could do tricks to combine the two lists into one, although that would make the code more complicated.



回答7:

Even though the following code might not fulfill the desired complexity constraints it might be an interesting alternative for the before-mentioned priority queue.

#include <queue>
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>

std::vector<int> largestIndices(const std::vector<double>& values, int k) {
    std::vector<int> ret;

    std::vector<std::pair<double, int>> q;
    int index = -1;
    std::transform(values.begin(), values.end(), std::back_inserter(q), [&](double val) {return std::make_pair(val, ++index); });
    auto functor = [](const std::pair<double, int>& a, const std::pair<double, int>& b) { return b.first > a.first; };
    std::make_heap(q.begin(), q.end(), functor);
    for (auto i = 0; i < k && i<values.size(); i++) {
        std::pop_heap(q.begin(), q.end(), functor);
        ret.push_back(q.back().second);
        q.pop_back();
    }

    return ret;
}

int main()
{
    std::vector<double> values = { 7,6,3,4,5,2,1,0 };
    auto ret=largestIndices(values, 4);
    std::copy(ret.begin(), ret.end(), std::ostream_iterator<int>(std::cout, "\n"));
}