Data structure to find median

2020-02-09 10:32发布

问题:

This is an interview question. Design a class, which stores integers and provides two operations:

void insert(int k)
int getMedian()

I guess I can use BST so that insert takes O(logN) and getMedian takes O(logN) (for getMedian I should add the number of of left/right children for each node).

Now I wonder if this is the most efficient solution and there is no better one.

回答1:

You can use 2 heaps, that we will call Left and Right.
Left is a Max-Heap.
Right is a Min-Heap.
Insertion is done like this:

  • If the new element x is smaller than the root of Left then we insert x to Left.
  • Else we insert x to Right.
  • If after insertion Left has count of elements that is greater than 1 from the count of elements of Right, then we call Extract-Max on Left and insert it to Right.
  • Else if after insertion Right has count of elements that is greater than the count of elements of Left, then we call Extract-Min on Right and insert it to Left.

The median is always the root of Left.

So insertion is done in O(lg n) time and getting the median is done in O(1) time.



回答2:

See this Stack Overflow question for a solution that involves two heaps.



回答3:

Would it beat an array of integers witch performs a sort at insertion time with a sort algorithm dedicated for integer (http://en.wikipedia.org/wiki/Sorting_algorithm) if you choose your candidate amongst O < O(log(n)) and using an array, then getMedian would be taking the index at half of the size would be O(1), no? It seems possible to me to do better than log(n) + log(n).

Plus by being a little more flexible you can improve your performance by changing your sort algorithm according to the properties of your input (are the input almost sorted or not ...).

I am pretty much autodidact in computer science, but that is the way I would do it: simpler is better.



回答4:

You could consider a self-balancing tree, too. If the tree is fully balanced, then the root node is your median. Say, the tree is one-level deeper on one end. Then, you just need to know how many nodes are there in the deeper-side to pick the correct median.