I was wondering how to calculate the hash code for a given string by hand. I understand that in Java, you can do something like:
String me = "What you say what you say what?";
long whatever = me.hashCode();
That's all good and dandy, but I was wondering how to do it by hand. I know the given formula for calculating the hash code of a string is something like:
S0 X 31 ^ (n-1) + S1 X 31 ^ (n-2) + .... + S(n-2) X 31 + S(n-1)
Where S indicates the character in the string, and n is the length of the string. Using 16 bit unicode then, the first character from string me would be computed as:
87 X (31 ^ 34)
However, that creates an insanely large number. I can't imagine adding all the characters together like that. So, in order to calculate the lowest-order 32 bits result then, what would I do? Long whatever from above equals -957986661 and I'm not how to calculate that?
Take a look at the source code of
java.lang.String
.The following statements will find the string hashCode
Let's check how exactly its calculated
HashCode = s[0]*31(n-1) + s[1]*31(n-2) + .. s(n-2)
As we all know that the character at position 0 is H, Character at position 1 is i, and the string length is 2.
==> H*31(2-1) + i*31(2-2)
As we all know that, ASCII code of H is 72, and i is 105. It means,
==> 72 * 31 + 105 * 1 (Anything Power 0 is 1)
==> 2232 + 105 = 2337
Source: https://www.tutorialgateway.org/find-string-hashcode-in-java/
Most hash functions of this sort calculate the hash value modulo some large number (e.g. a large prime). This avoids overflows and keeps the range of values returned by the function within a specified range. But this also means an infinite range of input values will get a hash value from a finite set of possible values (i.e. [0,modulus)), hence the problem of hash collisions.
In this case, the code would look something like this:
Exercise for the reader:
See the code for the
hashCode
function for java.util.String. Can you see why it does not use a modulus explicitly?