When and how does HashMap convert the bucket from

2020-01-29 01:17发布

问题:

I was going through java 8 features and found out that hashmaps use a red black tree instead of a linkedlist when the number of entry sets on the bucket increases.

However, doesn't this require the key to be Comparable or some ordering of the keys to exist and how does this work ? When does this conversion actually happens and how ?

回答1:

When there are at least 8 entries (TREEIFY_THRESHOLD) in a single bucket and the total number of buckets is more then 64 (MIN_TREEIFY_CAPACITY) then that single bucket will be transformed to a perfectly balanced red black tree node.

There is also the shrinkage that you should be aware of (if you want) that happens when you remove entries (UNTREEIFY_THRESHOLD == 6).

You are correct that keys should be Comparable - but that is not always required, it is good if they are (in case they have the same hashCode), but in case they are not, this is used:

 static int tieBreakOrder(Object a, Object b) {
        int d;
        if (a == null || b == null ||
            (d = a.getClass().getName().
             compareTo(b.getClass().getName())) == 0)
            d = (System.identityHashCode(a) <= System.identityHashCode(b) ?
                 -1 : 1);
        return d;
 }

So the className as a String is used for comparison and if that fails too, then System.identityHashCode is used (Marsaglia XOR-Shift algorithm) to decide the left and right.

To answer you question when this happens - when resize is called. When you have to resize your HashMap - there are some things happening; like the number of buckets increases by a factor of two (one more bit is taken into consideration where an entry will move or not) or a certain bucket is transformed to a tree. This process (again if you really care) is pretty slow, some people say that a Java HashMap is "sloooooooow, then is fast fast fast; then it's sloooooo, then it's fast fast fast" (I still see this as mocking, but there are PauselessHashMap implementations).

This brings two interesting points. First is choose the correct size of your Map initially (even a rough estimation will do), i.e.:

 new HashMap<>(256); // choosing the size

this will avoid some resizes.

The second one is why transforming to a Tree is important (think database indexes and why they are BTREE...). How many steps it would take you to find an entry in a perfect tree that has INTEGER.MAX_VALUE entries (theoretically). Only up to 32.