Suppose I have a balanced BST (binary search tree). Each tree node contains a special field count
, which counts all descendants of that node + the node itself. They call this data structure order statistics binary tree
.
This data structure supports two operations of O(logN):
rank(x)
-- number of elements that are less than x
findByRank(k)
-- find the node with rank
== k
Now I would like to add a new operation median()
to find the median. Can I assume this operation is O(1) if the tree is balanced?
If the tree is complete (i.e. all levels completely filled), yes you can.
Unless the tree is complete, the median might be a leaf node. So in general case the cost will be O(logN). I guess there is a data structure with requested properties and with a O(1) findMedian operation (Perhaps a skip list + a pointer to the median node; I'm not sure about findByRank and rank operations though) but a balanced BST is not one of them.
In a balanced order statistics tree, finding the median is O(log N). If it is important to find the median in O(1) time, you can augment the data structure by maintaining a pointer to the median. The catch, of course, is that you would need to update this pointer during each Insert or Delete operation. Updating the pointer would take O(log N) time, but since those operations already take O(log N) time, the extra work of updating the median pointer does not change their big-O cost.
As a practical matter, this only makes sense if you do a lot of "find median" operations compared to the number of insertions/deletions.
If desired, you could reduce the cost of updating the median pointer during Insert/Delete to O(1) by using a (doubly) threaded binary tree, but Insert/Delete would still be O(log N).