Rehashing process in hashmap or hashtable

2019-01-17 05:06发布

问题:

How is the rehashing process done in a hashmap or hashtable when the size exceeds the maxthreshold value?

Are all pairs just copied to a new array of buckets?

EDIT:

What happen to the elements in the same bucket (in linked list) after rehashing? I mean will they remain in same bucket after rehashing?

回答1:

The maximum threshold in the question is called the load factor.

It is advisable to have a load factor of around 0.75. Load factor is defined as (m/n) where n is the total size of the hash table and m is the preferred number of entries which can be inserted before a increment in size of the underlying data structure is required.

Rehashing can be done in two cases:

  1. When the present m'/n ratio increases beyond the load factor

  2. M'/n ratio falls to a very low value say 0.1

In both the cases m' is the current number of entries. Also, both the cases demand the shifting of the present entries into a bigger or a smaller hash table.

In the question's context rehashing is the process of applying a hash function to the entries to move them to another hash table. It is possible to use the hash function which was used earlier or use a new function altogether.

Please note: Rehashing is also done when a collision occurs. (It's a way of handling collisions too.)

To add some more context and a detailed discussion please visit my blog Hashing Basics



回答2:

Rehashing of a hash map is done when the number of elements in the map reaches the maximum threshold value.

Usually the load factor value is 0.75 and the default initial capacity value is 16. Once the number of elements reaches or crosses 0.75 times the capacity, then rehashing of map takes place. In this case when the number of elements are 12, then rehashing occurs. (0.75 * 16 = 12)

When rehashing occurs a new hash function or even the same hash function could be used but the buckets at which the values are present could change. Basically when rehashing occurs the number of buckets are approximately doubled and hence the new index at which the value has to be put changes.

While rehashing, the linked list for each bucket gets reversed in order. This happens because HashMap doesn't append the new element at the tail instead it appends the new element at the head. So when rehashing occurs, it reads each element and inserts it in the new bucket at the head and then keeps on adding next elements from the old map at the head of the new map resulting in reversal of linked list.

If there are multiple threads handling the same hash map it could result in infinite loop.

Detailed explanation stating how infinite loop occurs in the above case can be found here : http://mailinator.blogspot.hu/2009/06/beautiful-race-condition.html

If the elements inserted in the map has to be sorted wrt the keys then TreeMap can be used. But HashMap would be more efficient if order of keys doesn't matter.



回答3:

Hashing – ReHashing and Race condition

Basically while creating a hash map collection assign it a default size (of 10). Later stage when elements are added in the map and after certain stage when you come close to your initial defined capacity there is requirement of ReHashing to retain the performance.

There is LoadFactor defined for the collection (said to be good as .75) and this specifies the good index for time and space.

  • LARGER load factor => lower space consumption but higher lookups
  • SMALLER Load factor => Larger space consumption compare to the required no of elements.

Java specification suggests that Good loadfactor value is .75

Hence Suppose you have maximum requirement to store 10 elements in hash then considering the Good Loadfactor .75 = Rehashing would occur after adding 7 elements in the collection. In case if your requirement in this case would not accede to 7 then Rehashing would never occur.

If there are really large no of elements going to be stored in the hashmap then it is always good to create HashMap with sufficient capacity; this is more efficient than letting it to perform automatic rehashing.

RACE condition : While doing the rehasing internal elements which are stored in a linked list for given bucket. They get reverse in the order. Suppose there are two threads encounter the race condition in same time then there are chances of second therad can go in infinite loop while traversal since the order has been changed.