Unexpected complexity of common methods (size) in

2019-03-24 20:20发布

问题:

Recently, I've been surprised by the fact that some Java collections don't have constant time operation of method size().

While I learned that concurrent implementations of collections made some compromises as a tradeoff for gain in concurrency (size being O(n) in ConcurrentLinkedQueue, ConcurrentSkipListSet, LinkedTransferQueue, etc.) good news is that this is properly documented in API documentation.

What concerned me is the performance of method size on views returned by some collections' methods. For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n). At least that is what I managed to dig up from the source code of OpenJDK: In TreeSet is implemented as wrapper over TreeMap, and within a TreeMap, there is EntrySetView class whose size method is as follows:

abstract class EntrySetView extends AbstractSet<Map.Entry<K,V>> {
    private transient int size = -1, sizeModCount;

    public int size() {
        if (fromStart && toEnd)
            return m.size();
        if (size == -1 || sizeModCount != m.modCount) {
            sizeModCount = m.modCount;
            size = 0;
            Iterator i = iterator();
            while (i.hasNext()) {
                size++;
                i.next();
            }
        }
        return size;
    }

    ....
}

This means that first time size is called is O(n) and then it's cached as long as backing map is not modified. I was not able to find this fact in the API documentation. More efficient implementation would be O(log n) with memory tradeoff in caching of subtree sizes. Since such tradeoffs are being made for avoiding code duplication (TreeSet as wrapper over TreeMap), I don't see a reason why they should not be made for performance reasons.

Disregarding me being right or wrong with my (very brief) analysis of the OpenJDK implementation of TreeSet, I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

回答1:

For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n).

To me it is not surprising. Consider this sentence from the javadoc:

"The returned set is backed by this set, so changes in the returned set are reflected in this set, and vice-versa."

Since the tail set is a dynamic view of the backing set, it follows that its size has to be calculated dynamically in practice. The alternative would require that when a change was made to the backing set, it would have to adjust the sizes of all extant tailset (and headset) views. That would make updates to the backing set more expensive, AND it would present a storage management problem. (In order to update the view sizes, the backing set would need references to all existing view sets ... and that is a potential hidden memory leak.)

Now you do have a point regarding the documentation. But in fact, the javadocs says nothing about the complexity of the view collections. And, indeed, it doesn't even document that TreeSet.size() is O(1)! In fact, it only documents the complexity of the add, remove and contains operations.


I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

AFAIK, No. Certainly, not from Sun / Oracle ...