I am trying to figure something out about hashing in java. If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues? Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
A
HashMap
is a form of hash table (andHashTable
is another). They work by using thehashCode()
andequals(Object)
methods provided by theHashMap
s key type. Depending on how you want you keys to behave, you can use thehashCode
/equals
methods implemented byjava.lang.Object
... or you can override them.I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the
HashMap
andHashTable
classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:
where
prime
is a smallish prime number like31
. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.When you implement the
hashcode
andequals
methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:It is also worth noting that the default
hashCode
andequals
methods provided byObject
are based on the target object's identity.The hash values are typically not stored. Rather they are calculated as required.
In the case of the
HashMap
class, the hashcode for each key is actually cached in the hash chains. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.
To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.
Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.
Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.
The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.
Hope this helps with the internals of how it works :)
HashMap stores key-value pair in Map.Entry static nested class implementation. HashMap works on hashing algorithm and uses hashCode() and equals() method in put and get methods.
When we call put method by passing key-value pair, HashMap uses Key hashCode() with hashing to find out the index to store the key-value pair. The Entry is stored in the LinkedList, so if there are already existing entry, it uses equals() method to check if the passed key already exists, if yes it overwrites the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index in the array and then use equals() method to find the correct Entry and return it’s value. Below image will explain these detail clearly.
The other important things to know about HashMap are capacity, load factor, threshold resizing. HashMap initial default capacity is 16 and load factor is 0.75. Threshold is capacity multiplied by load factor and whenever we try to add an entry, if map size is greater than threshold, HashMap rehashes the contents of map into a new array with a larger capacity. The capacity is always power of 2, so if you know that you need to store a large number of key-value pairs, for example in caching data from database, it’s good idea to initialize the HashMap with correct capacity and load factor.
This is the most fundamental contract in Java: the
.equals()
/.hashCode()
contract. The most important part of it here is that two objects which are considered.equals()
should return the same.hashCode()
.The reverse is not true: objects not considered equal may return the same hash code. But it should be as rare an occurrence as possible. Consider the following
.hashCode()
implementation, which, while perfectly legal, is as broken an implementation as can exist:While this implementation obeys the contract, it is pretty much useless... Hence the importance of a good hash function to begin with.
Now: the
Set
contract stipulates that aSet
should not contain duplicate elements; however, the strategy of aSet
implementation is left... Well, to the implementation. You will notice, if you look at the javadoc ofMap
, that its keys can be retrieved by a method called.keySet()
. Therefore,Map
andSet
are very closely related in this regard.If we take the case of a
HashSet
(and, ultimately,HashMap
), it relies on.equals()
and.hashCode()
: when adding an item, it first calculates this item's hash code, and according to this hash code, attemps to insert the item into a given bucket. In contrast, aTreeSet
(andTreeMap
) relies on the natural ordering of elements (see Comparable).However, if an object is to be inserted and the hash code of this object would trigger its insertion into a non empty hash bucket (see the legal, but broken,
.hashCode()
implementation above), then.equals()
is used to determine whether that object is really unique.Note that, internally, a
HashSet
is aHashMap
...Hash values in Java are provided by objects through the implementation of
public int hashCode()
which is declared inObject
class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.A note: implementing that method requires also to have
public boolean equals(Object o)
implemented in a consistent manner.Hashing is a way to assign a unique code for any variable/object after applying any function/algorithm on its properties.