i have some confusion about Hashing and Rehashing. below is my understanding please correct me if i am wrong.
as per picture, bucket is actually the array of Entry class which stores the element in the form of linked list. each new key-value pair, whose key has same hashcode of Entry Array bucket will be stored as a Entry Object in bucket which is storing that hashcode elements. if the key has new hashcode which is currently not present in the Entry Array bucket then a new bucket will be added with the respective hashcode.
now the question is , when actually rehashing occurs?
case 1: lets suppose i have Entry array having bucket of hashcode 1,2 3 with addition of new element at the time Entry Array Bucket reached 12 its OK , but as soon as a new element comes whose hashcode is 13( assuming i am adding element in series of hashcode 1 then 2 and so on), a new Map / Array of Entry Bucket( please clarify which one) will be created, and the new capacity will be 32, now the array of Entry can hold distinct 32 buckets.
case 2 : number of bucket dosent matter, default capacity of HashMap 16 means it can store 16 element in it , dosent matter in single bucket or anyhow. because of load factor .75, as soon as 13th element is added a new array of bucket will be created with 32 in it , ie now the total Entry node in all the link list can be 32.
i think case 2 is correct. please enplane the Re Hashing process, better if this diagram will be used.
Rehashing increases the number of available buckets as a function of the number of entries currently stored in the HashMap
.
It occurs when the HashMap
implementation determines the number of buckets should be increased in order to maintain the expected O(1)
lookup and insertion performance.
You are correct regarding the .75 default load factor and how it will cause the HashMap
to be rehashed when the 13th entry is added.
However, it is not correct that default capacity of HashMap 16 means it can store 16 element in it
. Any bucket can store multiple entries. However, in order to preserve the desired performance, the average number of entries per bucket should be small. That's the reason we have the load factor, and the reason we should use a proper hashCode()
that spreads the keys as evenly as possible across the buckets.
Rehashing is the process of re-calculating the hashcode of already stored entries (Key-Value pairs), to move them to another bigger size hashmap when Load factor threshold is reached.
When the number of items in map, crosses the Load factor limit at that time hashmap doubles its capacity and hashcode is re-calculated of already stored elements for even distribution of key-value pairs across new buckets.
Why Rehashing is required?
After doubling the capacity, what to do with the key-value pairs already present in buckets?
If we keep the existing key-value pairs as it is, then doubling the capacity may not help,
because O(1) complexity will be achieved only if items are evenly distributed across all buckets.
So for each existing key-value pairs, hashcode is calculated again with increased hashmap capacity as a parameter, which results in either placing the item in same bucket or in different bucket.
Rehashing is done to distribute items across the new length hashmap, so that get and put operation time complexity remains O(1).
NOTE:
Hashmap maintain complexity of O(1) while inserting data in and getting data from hashmap, but for 13th key-value pair, put request will no longer be O(1), because as soon as map will realize that 13th element came in, that is 75% of map is filled.
It will first double the bucket(array) capacity and then it will go for Rehash.
Rehashing requires re-computing hashcode of already placed 12 key-value pairs again and putting them at new index which requires time.
you should read this it might help
http://javabypatel.blogspot.co.il/2015/10/what-is-load-factor-and-rehashing-in-hashmap.html