I am currently working on a collection library for my custom programming language. I already have several data types (Collection, List, Map, Set) and implementations for them (mutable and immutable), but what I was missing so far was hashCode
and equals
. While these are no problem for Lists as they are ordered collections, the play a special role for Sets and Maps. Two Sets are considered equal if they have the same size and the same elements, and the order in which the Sets maintain them should not make a difference in their equality. Because of the equals-hashCode-contract, the hashCode
implementation also has to reflect this behavior, meaning that two sets with the same elements but different ordering should have the same hash code. (The same applies for Maps, which are technically a Set of Key-Value-Pairs)
Example (Pseudocode):
let set1: Set<String> = [ "a", "b", "c" ]
let set2: Set<String> = [ "b", "c", "a" ]
set1 == set2 // should return true
set1.hashCode == set2.hashCode // should also return true
How would I implement a reasonably good hash algorithm for which the hashCode
s in the above example return the same value?
The JDK itself proposes the following solution to this problem. The contract of the java.util.Set interface states:
An alternative to using the sum of the entries' hash codes would be to use, for example, the
^
(XOR) operator.The Scala language uses an ordering-invariant version of the Murmurhash algorithm (cf. the private
scala.util.hashing.MurmurHash3
class) to implement thehashCode
(or##
) method of its immutable sets and similar collections.Here's the pseudocode for a possible implementation:
The
xor
function should return a string that is as long as the longest of the two arguments. It will XOR the bits in each until it gets to the end of one of the arguments. It will then take the remaining bits from the longer string and append those on.This implementation will mean that the hashCode of a set will be as long as the hashCode of its longest element. Because you are XORing the bits, at the end the hashcode will be the same regardless of the order of your elements. However, as with any hashing implementation, there will be the chance for collisions.
You can calculate the hash sum sorting your collection in alphabetical order.
There is the C# sample - I hope you can translate it in Java :)