This question follows on the answer given by Jon Skeet on the question: "What is the best algorithm for an overridden System.Object.GetHashCode?". To calculate the hash code the following algorithm is used:
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}
I don't understand why the numbers 17 and 23 are chosen. Why don't we pick 3 and 5? That are prime numbers as well. Can somebody explain what the best prime numbers to pick are and why?
The comments on the answer you link to already briefly try to explain why
17
and23
are not good primes to use here.A lot of .NET classes that make use of hash codes store elements in buckets. Suppose there are three buckets. Then all objects with hash code 0, 3, 6, 9, ... get stored in bucket 0. All objects with hash code 1, 4, 7, 10, ... get stored in bucket 1. All objects with bucket 2, 5, 8, 11, ... get stored in bucket 2.
Now suppose that your
GetHashCode()
useshash = hash * 3 + field3.GetHashCode();
. This would mean that unlesshash
is large enough for the multiplication to wrap around, in a hash set with three buckets, which bucket an object would end up in depends only onfield3
.With an uneven distribution of objects across buckets,
HashSet<T>
cannot give good performance.You want a factor that is co-prime to all possible number of buckets. The number of buckets itself will be prime, for the same reasons, therefore if your factor is prime, the only risk is that it's equal to the number of buckets.
.NET uses a fixed list of allowed numbers of buckets:
Your factor should be one that .NET doesn't use, and that other custom implementations are equally unlikely to use. This means
23
is a bad factor.31
could be okay with .NET's own containers, but could be equally bad with custom implementations.At the same time, it should not be so low that it gives lots of collisions for common uses. This is a risk with
3
and5
: suppose you have a customTuple<int, int>
implementation with lots of small integers. Keep in mind thatint.GetHashCode()
just returns thatint
itself. Suppose your multiplication factor is3
. That means that(0, 9)
,(1, 6)
,(2, 3)
and(3, 0)
all give the same hash codes.Both of the problems can be avoided by using sufficiently large primes, as pointed out in a comment that Jon Skeet had incorporated into his answer:
Once upon a time, large primes for multiplication may have been bad because multiplication by large integers was sufficiently slow that the performance difference was noticeable. Multiplication by
31
would be good in that case because it can be implemented asx * 31
=>x * 32 - x
=>(x << 5) - x
. Nowadays, though, the multiplication is far less likely to cause any performance problems, and then, generally speaking, the bigger the better.