As per my understanding I think:
- It is perfectly legal for two objects to have the same hashcode.
- If two objects are equal (using the equals() method) then they have the same hashcode.
- If two objects are not equal then they cannot have the same hashcode
Am I correct?
Now if am correct, I have the following question:
The HashMap
internally uses the hashcode of the object. So if two objects can have the same hashcode, then how can the HashMap
track which key it uses?
Can someone explain how the HashMap
internally uses the hashcode of the object?
You're mistaken on point three. Two entries can have the same hash code but not be equal. Take a look at the implementation of HashMap.get from the OpenJdk. You can see that it checks that the hashes are equal and the keys are equal. Were point three true, then it would be unnecessary to check that the keys are equal. The hash code is compared before the key because the former is a more efficient comparison.
If you're interested in learning a little more about this, take a look at the Wikipedia article on Open Addressing collision resolution, which I believe is the mechanism that the OpenJdk implementation uses. That mechanism is subtly different than the "bucket" approach one of the other answers mentions.
So here we see that if both the objects S1 and S2 have different content, then we are pretty sure that our overridden Hashcode method will generate different Hashcode(116232,11601) for both objects. NOW since there are different hash codes, so it won't even bother to call EQUALS method. Because a different Hashcode GUARANTEES DIFFERENT content in an object.
The hashcode determines which bucket for the hashmap to check. If there is more than one object in the bucket then a linear search is done to find which item in the bucket equals the desired item (using the
equals()
) method.In other words, if you have a perfect hashcode then hashmap access is constant, you will never have to iterate through a bucket (technically you would also have to have MAX_INT buckets, the Java implementation may share a few hash codes in the same bucket to cut down on space requirements). If you have the worst hashcode (always returns the same number) then your hashmap access becomes linear since you have to search through every item in the map (they're all in the same bucket) to get what you want.
Most of the time a well written hashcode isn't perfect but is unique enough to give you more or less constant access.
As it is said, a picture is worth 1000 words. I say: some code is better than 1000 words. Here's the source code of HashMap. Get method:
So it becomes clear that hash is used to find the "bucket" and the first element is always checked in that bucket. If not, then
equals
of the key is used to find the actual element in the linked list.Let's see the
put()
method:It's slightly more complicated, but it becomes clear that the new element is put in the tab at the position calculated based on hash:
i = (n - 1) & hash
herei
is the index where the new element will be put (or it is the "bucket").n
is the size of thetab
array (array of "buckets").First, it is tried to be put as the first element of in that "bucket". If there is already an element, then append a new node to the list.
You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
To Summarize:
HashMap works on the principle of hashing
put(key, value): HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.
get(key): HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with
equals()
.Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the
hashCode()
andequals()
methods of keys:If two keys are the same (
equals()
returnstrue
when you compare them), theirhashCode()
method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use
equals()
to tell them apart.