In-order traversal complexity in a binary search t

2019-03-15 18:28发布

问题:

Related question: Time Complexity of InOrder Tree Traversal of Binary Tree O(N)?, however it is based on a traversal via recursion (so in O(log N) space) while iterators allow a consumption of only O(1) space.

In C++, there normally is a requirement that incrementing an iterator of a standard container be a O(1) operation. With most containers it's trivially proved, however with map and such, it seems a little more difficult.

  • If a map were implemented as a skip-list, then the result would be obvious
  • However they are often implemented as red-black trees (or at least as binary search trees)

So, during an in-order traversal there are moments where the "next" value is not so easily reached. For example should you be pointing at the bottom-right leaf of the left subtree, then the next node to traverse is the root, which is depth steps away.

I have tried "proving" that the algorithmic complexity (in terms of "steps") was amortized O(1), which seems alright. However I don't have the demonstration down yet.

Here is a small diagram I traced for a tree with a depth of 4, the numbers (in the place of the nodes) represent the number of steps to go from that node to the next one during an in-order traversal:

       3
   2       2
 1   1   1   1
1 2 1 3 1 2 1 4

Note: the right-most leaf has a cost of 4 in case this would be a sub-tree of a larger tree.

The sum is 28, for a total number of nodes of 15: thus a cost less than 2 per node, in average, which (if it holds up) would be a nice amortized cost. So:

  • During in-order traversal, is incrementing the iterator really O(1) for a balanced (and full) binary search tree ?
  • May the result be extended to cover non-full binary search trees ?

回答1:

Yes, the amortized cost is indeed O(1) per iteration, for a any tree.

The proof is based on the number of times you "visit" each node.
Leaves are visited only once. None leaves are visited at most 3 times:

  1. when going from the parent to the node itself.
  2. when coming back from the left subtree
  3. when coming back from the right subtree

There are no more visits to any nodes, thus if we sum the number of visits of each node, we get a number that is smaller then 3n, so the total number of visits of all nodes combined is O(n), which gives us O(1) per step amortized.

(Note since in a full tree there are n/2 leaves, we are getting the 2n you were encountering, I believe one can show that the sum of visits will be smaller then 2n for any tree, but this "optimization" is out of scope here IMO).


The worst case per step is O(h), which is O(logn) in a balanced tree, but might be O(n) in some cases.


P.S. I have no idea how Red-Black trees are implemented in C++, but if your tree data structure contains a parent field from each node, it can replace the recursive stack and allow O(1) space consumption. (This is of course "cheating" because storing n such fields is O(n) itself).