Recently, I've been surprised by the fact that some Java collections don't have constant time operation of method size().
While I learned that concurrent implementations of collections made some compromises as a tradeoff for gain in concurrency (size being O(n) in ConcurrentLinkedQueue, ConcurrentSkipListSet, LinkedTransferQueue, etc.) good news is that this is properly documented in API documentation.
What concerned me is the performance of method size on views returned by some collections' methods. For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n). At least that is what I managed to dig up from the source code of OpenJDK: In TreeSet is implemented as wrapper over TreeMap, and within a TreeMap, there is EntrySetView class whose size method is as follows:
abstract class EntrySetView extends AbstractSet<Map.Entry<K,V>> {
private transient int size = -1, sizeModCount;
public int size() {
if (fromStart && toEnd)
return m.size();
if (size == -1 || sizeModCount != m.modCount) {
sizeModCount = m.modCount;
size = 0;
Iterator i = iterator();
while (i.hasNext()) {
size++;
i.next();
}
}
return size;
}
....
}
This means that first time size is called is O(n) and then it's cached as long as backing map is not modified. I was not able to find this fact in the API documentation. More efficient implementation would be O(log n) with memory tradeoff in caching of subtree sizes. Since such tradeoffs are being made for avoiding code duplication (TreeSet as wrapper over TreeMap), I don't see a reason why they should not be made for performance reasons.
Disregarding me being right or wrong with my (very brief) analysis of the OpenJDK implementation of TreeSet, I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?