As per my understanding I think:
- It is perfectly legal for two objects to have the same hashcode.
- If two objects are equal (using the equals() method) then they have the same hashcode.
- If two objects are not equal then they cannot have the same hashcode
Am I correct?
Now if am correct, I have the following question:
The HashMap
internally uses the hashcode of the object. So if two objects can have the same hashcode, then how can the HashMap
track which key it uses?
Can someone explain how the HashMap
internally uses the hashcode of the object?
Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.
Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null. When you create HashMap with default constructor
Array is gets created with size 16 and default 0.75 load balance.
(Source)
Here is a rough description of
HashMap
's mechanism, forJava 8
version, (it might be slightly different from Java 6).Data structures
Hash value is calculated via
hash()
on key, and it decide which bucket of the hashtable to use for a given key.When count of elements in a bucket is small, a singly linked list is used.
When count of elements in a bucket is large, a red-black tree is used.
Classes (internal)
Map.Entry
Represent a single entity in map, the key/value entity.
HashMap.Node
Linked list version of node.
It could represent:
Because it has a hash property.
HashMap.TreeNode
Tree version of node.
Fields (internal)
Node[] table
The bucket table, (head of the linked lists).
If a bucket don't contains elements, then it's null, thus only take space of a reference.
Set<Map.Entry> entrySet
Set of entities.int size
Number of entities.
float loadFactor
Indicate how full the hash table is allowed, before resizing.
int threshold
The next size at which to resize.
Formula:
threshold = capacity * loadFactor
Methods (internal)
int hash(key)
Calculate hash by key.
How to map hash to bucket?
Use following logic:
About capacity
In hash table, capacity means the bucket count, it could be get from
table.length
.Also could be calculated via
threshold
andloadFactor
, thus no need to be defined as a class field.Could get the effective capacity via:
capacity()
Operations
First find the bucket by hash value, then loop linked list or search sorted tree.
First find the bucket according to hash value of key.
Then try find the value:
When
threshold
reached, will double hashtable's capacity(table.length
), then perform a re-hash on all elements to rebuild the table.This could be an expensive operation.
Performance
Time complexity is
O(1)
, because:O(1)
.O(1)
.O(1)
, notO(log N)
.Your third assertion is incorrect.
It's perfectly legal for two unequal objects to have the same hash code. It's used by
HashMap
as a "first pass filter" so that the map can quickly find possible entries with the specified key. The keys with the same hash code are then tested for equality with the specified key.You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232 possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)
Java 8 update in HashMap-
you do this operation in your code -
so, suppose your hashcode returned for both keys
"old"
and"very-old"
is same. Then what will happen.myHashMap
is a HashMap, and suppose that initially you didn't specify its capacity. So default capacity as per java is 16. So now as soon as you initialised hashmap using the new keyword, it created 16 buckets. now when you executed first statement-then hashcode for
"old"
is calculated, and because the hashcode could be very large integer too, so, java internally did this - (hash is hashcode here and >>> is right shift)so to give as a bigger pictureit will return some index, which would be between 0 to 15. Now your key value pair
"old"
and"key-value-pair"
would be converted to Entry object's key and value instance variable. and then this entry object will be stored in the bucket, or you can say that at a particular index, this entry object would be stored.FYI- Entry is a class in Map interface- Map.Entry, with these signature/definition
now when you execute next statement -
and
"very-old"
gives same hashcode as"old"
, so this new key value pair is again sent to the same index or the same bucket. But since this bucket is not empty, then thenext
variable of the Entry object is used to store this new key value pair.and this will be stored as linked list for every object which have the same hashcode, but a TRIEFY_THRESHOLD is specified with value 6. so after this reaches, linked list is converted to the balanced tree(red-black tree) with first element as the root.
It gonna be a long answer , grab a drink and read on …
Hashing is all about storing a key-value pair in memory that can be read and written faster. It stores keys in an array and values in a LinkedList .
Lets Say I want to store 4 key value pairs -
So to store the keys we need an array of 4 element . Now how do I map one of these 4 keys to 4 array indexes (0,1,2,3)?
So java finds the hashCode of individual keys and map them to a particular array index . Hashcode Formulae is -
Hash and girl !! I know what you are thinking. Your fascination about that wild duet might made you miss an important thing .
Why java multiply it with 31 ?
Now how this hash code is mapped to an array index?
answer is ,
Hash Code % (Array length -1)
. So“girl”
is mapped to(3173020 % 3) = 1
in our case . which is second element of the array .and the value “ahhan” is stored in a LinkedList associated with array index 1 .
HashCollision - If you try to find
hasHCode
of the keys“misused”
and“horsemints”
using the formulae described above you’ll see both giving us same1069518484
. Whooaa !! lesson learnt -Now the hash map looks like –
Now if some body tries to find the value for the key
“horsemints”
, java quickly will find the hashCode of it , module it and start searching for it’s value in the LinkedList correspondingindex 1
. So this way we need not search all the 4 array indexes thus making data access faster.But , wait , one sec . there are 3 values in that linkedList corresponding Array index 1, how it finds out which one was was the value for key “horsemints” ?
Actually I lied , when I said HashMap just stores values in LinkedList .
It stores both key value pair as map entry. So actually Map looks like this .
Now you can see While traversing through the linkedList corresponding to ArrayIndex1 it actually compares key of each entry to of that LinkedList to “horsemints” and when it finds one it just returns the value of it .
Hope you had fun while reading it :)
Hash map works on the principle of hashing
HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of Entry object . So thinking that Only value is stored in the bucket is not correct and will not give a good impression on the interviewer .
If key is null , then Null keys always map to hash 0, thus index 0.
If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like
and now ,it applies returned hashValue into its own hashing function .
We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is It defends against poor quality hash functions.
Now final hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex)