I have a range of objects that have a long
field whose value uniquely identifies a particular object across my entire system, much like a GUID. I have overriden Object.equals()
to use this id for comparison, beause I want it to work with copies of the object. Now I want to override Object.hashCode()
, too, which basically means mapping my long
to some int
return value.
If I understood the purpose of hashCode
correctly, it is mainly used in hash tables, so a uniform distribution would be desirable. This would mean, simply returning id % 2^32
would suffice. Is that all, or should I be aware of something else?
It's a bit of a minor thing if you're not using Guava already, but Guava can do this for you nicely:
That gives you the equivalent of
Long.valueOf(id).hashCode()
:Additionally, if you were to have other values or objects that were part of the hashcode, you could just write
The
long
would be autoboxed into aLong
so you'd get the correct hashcode for it as part of the overall hashcode.You have understood the purpose of
hashCode
correctly. Yes, an uniform distribution is desirable (although not an actual requirement).I would suggest
((id >> 32) ^ id)
.The above expression:
Java 8 adds Long.hashCode(long) to the JDK.
The following code could yield higher performance. This code reduces the calculation to 32-bit
int
instead of computing with 64-bitlong
. This can make a difference on 32-bit and smaller architectures. 32-bit processes on x86 machines could optimize this into a single instruction which simply XORs 2 registers.return (int)(value ^ (value >>> 32));
As noted in other answers, this does not have a good avalanche effect and hence could lead to collisions. One could go with cryptographic hash functions to ensure high avalanche effect. However, there are other algorithms such as Murmur Hash (more information) which have very good avalanche effect but don't consume as much CPU time.
(l >> 32) ^ l
is a good hashcode in most cases; particularly when the long has a uniform distribution.Since it was the accepted answer, I'm posting this to clarify some of my comments about when it's NOT a good hashcode for a long.
The example I gave was a Point class like this:
It may seem contrived, but occasionally you have multiple "fields" packed into a long.
So the
coords
field represents 32 bits of x and 32 bits of y. So why is this a problem? Well, it's not if each of x and y are evenly distributed over their respective 32 bits. But that's unlikely in practice. What is more likely is that X and Y are bounded by some number. Let's say 1024 since it's 2^10. This means that at most the lower 10 bits of each X and Y are set:There are 2^20 (1024*1024) possible combinations. But what's the operation hashCode is doing?
There are at most 2^10 (1024) possible hashCode values since only the lower 10 bits can ever be anything other than zero. The ratio of hash values to real values is
1024:(1024*1024)
or1:1024
. So right off the bat there is a 1/1024 probability that two numbers have the same hash.Now let's calculate the probability of a collision by applying math from the birthday problem. Let p(n) be the probability that with n values there will be at least one collision. We know that p(1025+) = 1 since there are only 1024 values.
This works out to the following:
With just 38 items, there is probably a collision. With 148 items, there is a 99.999% chance of (at least one) collision. With 148 items, each item has a 7% chance of colliding with another item. With a proper hashing function, taking knowledge of the domain, these numbers could easily go down to 0.
In other words, knowing your domain and how things happen in practice are key to making a performant hash. Library functions try to do as good a job as possible knowing nothing about your domain, and to be performant typically rely on a distribution of data that won't occur in practice.
Since Java 8 you can use
For older versions of Java you can use the following:
Note that this solution creates a new Object for the stack, while the first doesn't (although it is likely that Java optimizes the object creation away..)
Looking at the docs, both ways just use the following algorithm:
These are decent solutions since they make use of the Java library - always better to leverage off of something that has been tested already.
will be more well distributed, because modulo will not return different value if only upper bits of your long value has changed.