I've heard in my degree classes that a HashTable
will place a new entry into the 'next available' bucket if the new Key entry collides with another.
How would the HashTable
still return the correct Value if this collision occurs when calling for one back with the collision key?
I'm assuming that the Keys
are String
type and the hashCode()
returns the default generated by say Java.
If I implement my own hashing function and use it as part of a look-up table (i.e. a HashMap
or Dictionary
), what strategies exist for dealing with collisions?
I've even seen notes relating to prime numbers! Information not so clear from Google search.
As there is some confusion about which algorithm Java's HashMap is using (in the Sun/Oracle/OpenJDK implementation), here the relevant source code snippets (from OpenJDK, 1.6.0_20, on Ubuntu):
This method (cite is from lines 355 to 371) is called when looking up an entry in the table, for example from
get()
,containsKey()
and some others. The for loop here goes through the linked list formed by the entry objects.Here the code for the entry objects (lines 691-705 + 759):
Right after this comes the
addEntry()
method:This adds the new Entry on the front of the bucket, with a link to the old first entry (or null, if no such one). Similarily, the
removeEntryForKey()
method goes through the list and takes care of deleting only one entry, letting the rest of the list intact.So, here is a linked entry list for each bucket, and I very doubt that this changed from
_20
to_22
, since it was like this from 1.2 on.(This code is (c) 1997-2007 Sun Microsystems, and available under GPL, but for copying better use the original file, contained in src.zip in each JDK from Sun/Oracle, and also in OpenJDK.)
This is actually not true, at least for the Oracle JDK (it is an implementation detail that could vary between different implementations of the API). Instead, each bucket contains a linked list of entries.
It uses the
equals()
to find the actually matching entry.There are various collision handling strategies with different advantages and disadvantages. Wikipedia's entry on hash tables gives a good overview.
I strongly suggest you to read this blog post which appeared on HackerNews recently: How HashMap works in Java
In short, the answer is
It will use the equals method to see if the key is present even and especially if there are more than one element in the same bucket.
here's a very simple hash table implementation in java. in only implements
put()
andget()
, but you can easily add whatever you like. it relies on java'shashCode()
method that is implemented by all objects. you could easily create your own interface,and force it to be implemented by the keys if you like.
There are multiple techniques available to handle collision. I will explain some of them
Chaining: In chaining we use array indexes to store the values. If hash code of second value also points to the same index then we replace that index value with an linked list and all values pointing to that index are stored in the linked list and actual array index points to the head of the the linked list. But if there is only one hash code pointing to an index of array then the value is directly stored in that index. Same logic is applied while retrieving the values. This is used in Java HashMap/Hashtable to avoid collisions.
Linear probing: This technique is used when we have more index in the table than the values to be stored. Linear probing technique works on the concept of keep incrementing until you find an empty slot. The pseudo code looks like this:
Double hashing technique: In this technique we use two hashing functions h1(k) and h2(k). If the slot at h1(k) is occupied then the second hashing function h2(k) used to increment the index. The pseudo-code looks like this:
Linear probing and double hashing techniques are part of open addressing technique and it can only be used if available slots are more than the number of items to be added. It takes less memory than chaining because there is no extra structure used here but its slow because of lot of movement happen until we find an empty slot. Also in open addressing technique when an item is removed from a slot we put an tombstone to indicate that the item is removed from here that is why its empty.
For more information see this site.