Hashing function in Hashtable vs. HashMap?

2020-06-23 07:46发布

I know the difference between Hashtable and HashMap. However, both these classes seemingly are using a hash function to get the job done. Is there a difference between the hash function used in Hashtable, and the hash function used in HashMap?

In particular, is there a difference between the hashing algorithm they use? What is the formula used to hash in these two classes?

In other words, is the way for calculating index (hash value) different?

2条回答
闹够了就滚
2楼-- · 2020-06-23 08:19

java.util.Hashtable<K,V> is like java.util.Vector<T>. It's a class added to the SDK very early in development which has been superseded by HashMap<K,V> (as ArrayList<T> superseded Vector<T>).

So you simply shouldn't use it unless you require implicit synchronization of all the operations, which comes by default with Hashtable, but you can stil use Collections.synchronizedMap for that purpose or a ConcurrentHashMap<K,V>.

As stated in Javadoc:

As of the Java 2 platform v1.2, this class was retrofitted to implement the Map interface, making it a member of the Java Collections Framework. Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.

The hashing of the two classes should be the same, as they'll both use int Object::hashCode for their purpose.

查看更多
做自己的国王
3楼-- · 2020-06-23 08:20

In particular, is there a difference between the hashing algorithm they use? What is the formula used to hash in these two classes?

The primary hash function used when you use an object as a hash table key is the object's hashCode() method. It is up the to the key class to implement a decent hash function.

The Hashtable and HashMap classes take the key's hashcode value and convert it to an index in the primary hashtable array-of-chains. However, there are differences in how this happens between Hashtable and HashMap.

  • For Hashtable (Java 8) the code is this:

     hash = key.hashCode();
     index = (hash & 0x7FFFFFFF) % tab.length;
    
  • For HashMap (Java 8) the code is (effectively) this:

     // (I have restructured the code for ease of comparison.)
     int h;
     hash = (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
     index = (tab.length - 1) & hash;
    

As you can see, HashMap is scrambling the hashcode value returned by the key's hashcode function. This is explained in the source code as follows:

[This method] computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

Notes:

  1. The & versus % difference is because in Hashtable the hash array size is a prime number, but in HashMap (Java 8) the size is a power of 2.

  2. In Java 8 HashMap, the implementation will turn a long hash chain into a binary tree if the key class implements Comparable.

  3. HashMap handles null keys, but Hashtable doesn't.


However, all of this extra complexity in HashMap only comes into play if your key class has a poorly designed / implemented hashCode() method ... or if someone is deliberately trying to engineer hash collisions.

In other words, if your key class is well designed, the differences should not matter.

查看更多
登录 后发表回答