In researching complexity for any algorithm that traverses a binary search tree, I see two different ways to express the same thing:
Version #1: The traversal algorithm at worst case compares once per height of the tree; therefore complexity is O(h)
.
Version #2: The traversal algorithm at worst case compares once per height of the tree; therefore complexity is O(logN)
.
It seems to me that the same logic is at work, yet different authors use either logN
or h
. Can someone explain to me why this is the case?
If your binary tree is balanced so that every node has exactly two child nodes, then the number of nodes in the tree will be exactly N = 2h − 1, so the height is the logarithm of the number of elements (and similarly for any complete n-ary tree).
An arbitrary, unconstrained tree may look totally different, though; for instance, it could just have one node at every level, so N = h in that case. So the height is the more general measure, as it relates to actual comparisons, but under the additional assumption of balance you can express the height as the logarithm of the number of elements.
The correct value for the worst-case time to search is tree is O(h), where h is the height of a tree. If you are using a balanced search tree (one where the height of the tree is O(log n)), then the lookup time is worst-case O(log n). That said, not all trees are balanced. For example, here's a tree with height n - 1:
1
\
2
\
3
\
...
\
n
Here, h = O(n), so the lookup is O(n). It's correct to say that the lookup time is also O(h), but h ≠ O(log n) in this case and it would be erroneous to claim that the lookup time was O(log n).
In short, O(h) is the correct bound. O(log n) is the correct bound in a balanced search tree when the height is at most O(log n), but not all trees have lookup time O(log n) because they may be imbalanced.
Hope this helps!
O(h) would refer to a binary tree that is sorted but not balanced
O(logn) would refer to a tree that is sorted and balanced
It's sort of two ways of saying the same thing, because your average balanced binary tree of height 'h' will have around 2^h nodes.
Depending on the context, either height or #nodes may be more relevant, and so that's what you'll see referenced.
because the (h)eight of a balanced tree varies as the log of the (N)umber of elements