I'm trying to come up with something to solve the following:
Given a max-heap represented as an array, return the kth largest element without modifying the heap. I was asked to do it in linear time, but was told it can be done in log time.
I thought of a solution:
Use a second max-heap and fill it with k or k+1 values into it (breadth first traversal into the original one) then pop k elements and get the desired one. I suppose this should be O(N+logN) = O(N)
Is there a better solution, perhaps in O(logN) time?
The max-heap can have many ways, a better case is a complete sorted array, and in other extremely case, the heap can have a total asymmetric structure.
Here can see this:
In the first case, the kth lagest element is in the kth position, you can compute in O(1) with a array representation of heap.
But, in generally, you'll need to check between (k, 2k) elements, and sort them (or partial sort with another heap). As far as I know, it's O(K·log(k))
And the algorithm:
Input:
Integer kth <- 8
Heap heap <- {19,18,10,17,14,9,4,16,15,13,12}
BEGIN
Heap positionHeap <- Heap with comparation: ((n0,n1)->compare(heap[n1], heap[n0]))
Integer childPosition
Integer candidatePosition <- 0
Integer count <- 0
positionHeap.push(candidate)
WHILE (count < kth) DO
candidatePosition <- positionHeap.pop();
childPosition <- candidatePosition * 2 + 1
IF (childPosition < size(heap)) THEN
positionHeap.push(childPosition)
childPosition <- childPosition + 1
IF (childPosition < size(heap)) THEN
positionHeap.push(childPosition)
END-IF
END-IF
count <- count + 1
END-WHILE
print heap[candidate]
END-BEGIN
EDITED
I found "Optimal Algorithm of Selection in a min-heap" by Frederickson here:
ftp://paranoidbits.com/ebooks/An%20Optimal%20Algorithm%20for%20Selection%20in%20a%20Min-Heap.pdf
No, there's no O(log n)-time algorithm, by a simple cell probe lower bound. Suppose that k is a power of two (without loss of generality) and that the heap looks like (min-heap incoming because it's easier to label, but there's no real difference)
1
2 3
4 5 6 7
.............
permutation of [k, 2k).
In the worst case, we have to read the entire permutation, because there are no order relations imposed by the heap, and as long as k is not found, it could be in any location not yet examined. This takes time Omega(k), matching the (complicated!) algorithm posted by templatetypedef.
To the best of my knowledge, there's no easy algorithm for solving this problem. The best algorithm I know of is due to Frederickson and it isn't easy. You can check out the paper here, but it might be behind a paywall. It runs in time O(k) and this is claimed to be the best possible time, so I suspect that a log-time solution doesn't exist.
If I find a better algorithm than this, I'll be sure to let you know.
Hope this helps!
Max-heap in an array: element at i is larger than elements at 2*i+1 and 2*i+2
(i
is 0-based)
You'll need another max heap (insert
, pop
, empty
) with element pairs (value, index)
sorted by value
. Pseudocode (without boundary checks):
input: k
1. insert (at(0), 0)
2. (v, i) <- pop and k <- k - 1
3. if k == 0 return v
4. insert (at(2*i+1), 2*i+1) and insert (at(2*+2), 2*+2)
5. goto 2
Runtime evaluation
- array access at(i): O(1)
- insertion into heap: O(log n)
- insert at 4. takes at most log(k) since the size of heap of pairs is at most k + 1
- statement 3. is reached at most k times
- total runtime: O(k log k)