Kth largest element in a max-heap

2019-02-03 09:07发布

问题:

I'm trying to come up with something to solve the following:

Given a max-heap represented as an array, return the kth largest element without modifying the heap. I was asked to do it in linear time, but was told it can be done in log time.

I thought of a solution:

Use a second max-heap and fill it with k or k+1 values into it (breadth first traversal into the original one) then pop k elements and get the desired one. I suppose this should be O(N+logN) = O(N)

Is there a better solution, perhaps in O(logN) time?

回答1:

The max-heap can have many ways, a better case is a complete sorted array, and in other extremely case, the heap can have a total asymmetric structure.

Here can see this:

In the first case, the kth lagest element is in the kth position, you can compute in O(1) with a array representation of heap. But, in generally, you'll need to check between (k, 2k) elements, and sort them (or partial sort with another heap). As far as I know, it's O(K·log(k))

And the algorithm:

Input:
    Integer kth <- 8
    Heap heap <- {19,18,10,17,14,9,4,16,15,13,12}

BEGIN
    Heap positionHeap <- Heap with comparation: ((n0,n1)->compare(heap[n1], heap[n0]))

    Integer childPosition
    Integer candidatePosition <- 0
    Integer count <- 0
    positionHeap.push(candidate)
    WHILE (count < kth) DO
        candidatePosition <- positionHeap.pop();
        childPosition <- candidatePosition * 2 + 1
        IF (childPosition < size(heap)) THEN
            positionHeap.push(childPosition)
            childPosition <- childPosition + 1
            IF (childPosition < size(heap)) THEN
                positionHeap.push(childPosition)
            END-IF
        END-IF
        count <- count + 1
    END-WHILE
    print heap[candidate]
END-BEGIN

EDITED

I found "Optimal Algorithm of Selection in a min-heap" by Frederickson here: ftp://paranoidbits.com/ebooks/An%20Optimal%20Algorithm%20for%20Selection%20in%20a%20Min-Heap.pdf



回答2:

No, there's no O(log n)-time algorithm, by a simple cell probe lower bound. Suppose that k is a power of two (without loss of generality) and that the heap looks like (min-heap incoming because it's easier to label, but there's no real difference)

      1
   2     3
  4 5   6 7
.............
permutation of [k, 2k).

In the worst case, we have to read the entire permutation, because there are no order relations imposed by the heap, and as long as k is not found, it could be in any location not yet examined. This takes time Omega(k), matching the (complicated!) algorithm posted by templatetypedef.



回答3:

To the best of my knowledge, there's no easy algorithm for solving this problem. The best algorithm I know of is due to Frederickson and it isn't easy. You can check out the paper here, but it might be behind a paywall. It runs in time O(k) and this is claimed to be the best possible time, so I suspect that a log-time solution doesn't exist.

If I find a better algorithm than this, I'll be sure to let you know.

Hope this helps!



回答4:

Max-heap in an array: element at i is larger than elements at 2*i+1 and 2*i+2 (i is 0-based)

You'll need another max heap (insert, pop, empty) with element pairs (value, index) sorted by value. Pseudocode (without boundary checks):

input: k
1. insert (at(0), 0)
2. (v, i) <- pop and k <- k - 1
3. if k == 0 return v
4. insert (at(2*i+1), 2*i+1) and insert (at(2*+2), 2*+2)
5. goto 2

Runtime evaluation

  • array access at(i): O(1)
  • insertion into heap: O(log n)
  • insert at 4. takes at most log(k) since the size of heap of pairs is at most k + 1
  • statement 3. is reached at most k times
  • total runtime: O(k log k)