I was looking through some of the .net source yesterday and saw several implementations of GetHashcode with something along the lines of this:
(i1 << 5) + i ^ i2
I understand what the code is doing and why. What I want to know is why they used (i1 << 5) + i instead of (i1 << 5) - i.
Most frameworks I've seen use -i because that's equivalent to multiplying by 31 which is prime, but the Microsoft way is equivalent to multiplying by 33 which has 11 and 3 as factors and thus isn't prime.
Is there a known justification for this? Any reasonable hypotheses?
I don't remember if 31 is one of those primes, but there are certain primes which get used as capacities by
Dictionary<K,V>
. And if you use the left field doesn't influence the chosen bucket anymore and the hash degenerates.I asked the same question on math.stackexchange.com: Curious Properties of 33.
The conjecture among mathematicians and the research I did on the topic leads me to believe that the answer is this:
Basically, in entropy and speed comparisons, Bernstein does well enough and is quite snappy. Dan Bernstein, the guy who came up with the constant 33, wasn't able to explain what property of 33 produced such a good distribution of hashes.
Several papers have been written comparing hash functions and have corroborated this finding without further explaining the benefit of using 33. Further, I couldn't find why Java uses 31 instead. It appears to be a mathematical and programming mystery to date.