I need to find the indices of the k largest elements of an unsorted, length n, array/vector in C++, with k < n. I have seen how to use nth_element() to find the k-th statistic, but I'm not sure if using this is the right choice for my problem as it seems like I would need to make k calls to nth_statistic, which I guess it would have complexity O(kn), which may be as good as it can get? Or is there a way to do this just in O(n)?
Implementing it without nth_element() seems like I will have to iterate over the whole array once, populating a list of indices of the largest elements at each step.
Is there anything in the standard C++ library that makes this a one-liner or any clever way to implement this myself in just a couple lines? In my particular case, k = 3, and n = 6, so efficiency isn't a huge concern, but it would be nice to find a clean and efficient way to do this for arbitrary k and n.
It looks like Mark the top N elements of an unsorted array is probably the closest posting I can find on SO, the postings there are in Python and PHP.
The question has the partial answer; that is
std::nth_element
returns the "the n-th statistic" with a property that none of the elements preceding nth one are greater than it, and none of the elements following it are less.Therefore, just one call to
std::nth_element
is enough to get the k largest elements. Time complexity will be O(n) which is theoretically the smallest since you have to visit each element at least one time to find the smallest (or in this case k-smallest) element(s). If you need these k elements to be ordered, then you need to order them which will be O(k log(k)). So, in total O(n + k log(k)).This should be an improved version of @hazelnusse which is executed in
O(nlogk)
instead ofO(nlogn)
The standard library won't get you a list of indices (it has been designed to avoid passing around redundant data). However, if you're interested in n largest elements, use some kind of partitioning (both
std::partition
andstd::nth_element
are O(n)):Here is my implementation that does what I want and I think is reasonably efficient:
which gives output:
You can do this in
O(n)
time with a single order statistic calculation:r
be thek
-th order statisticbigger
andequal
.i
:array[i] > r
, addi
tobigger
array[i] = r
, addi
toequal
equal
until the sum of the lengths of the two lists isk
Naturally, you only need one list if all items are distinct. And if needed, you could do tricks to combine the two lists into one, although that would make the code more complicated.
Even though the following code might not fulfill the desired complexity constraints it might be an interesting alternative for the before-mentioned priority queue.