I'm trying to implement my own hash function, i add up the ASCII numbers of each string, using java. I find the hash code by finding the mod of the size of the hash table and the sum. size%sum. I was wondering if there was a way to use the same process but reduce collisions, when searching for the string?
Thanks in advance.
The Java String.hashcode() makes a tradeoff between being a really good hash function and being as efficient as possible. Simply adding up the character values in a string is not a reliable hash function.
For example, consider the two strings
dog
andgod
. Since they both contain a 'd', 'g', and an 'o', no method involving only addition will ever result in a different hash code.Joshua Bloch, who implemented a good part of Java, discusses the String.hashCode() method in his book Effective Java and talks about how, in versions of Java prior to 1.3, the String.hashCode() function used to consider only 16 characters in a given String. This ran somewhat faster than the current implementation, but resulted is shockingly poor performance in certain situations.
In general, if your specific data set is very well-defined and you could exploit some uniqueness in it, you could probably make a better hash function. For general purpose Strings, good luck.
I would look at the code for String and HashMap as these have a low collision rate and don't use
%
and handle negative numbers.From the source for String
From the source for HashMap
As the HashMap is always a power of 2 in size you can use
and
Using
&
is much faster than%
and only return positive numbers as length is positive.