I need to make a custom struct conform to Hashable
so that I can use it as a Dictionary key type. The challenge though, is that two of the struct's properties are interchangeable for the purpose of identifying a unique instance.
Here's a simplified example to illustrate the problem:
struct MultiplicationQuestion {
let leftOperand: Int
let rightOperand: Int
var answer: Int { return leftOperand * rightOperand }
}
The two important properties for identifying a unique MultiplicationQuestion
are leftOperand
and rightOperand
, but it doesn't matter what order they are in, because '1 x 2' is essentially the same question as '2 x 1'. (For reasons I won't go into here, they need to be kept as separate properties.)
I tried defining Hashable
conformance as follows, knowing that there is a conflict between equality as I've defined it for ==
and what the built-in Hasher is going to do:
extension MultiplicationQuestion: Hashable {
static func == (lhs: MultiplicationQuestion, rhs: MultiplicationQuestion) -> Bool {
return (lhs.leftOperand == rhs.leftOperand && lhs.rightOperand == rhs.rightOperand) || (lhs.leftOperand == rhs.rightOperand && lhs.rightOperand == rhs.leftOperand)
}
func hash(into hasher: inout Hasher) {
hasher.combine(leftOperand)
hasher.combine(rightOperand)
}
}
I tested this by creating two Sets of questions and performing various operations on them:
var oneTimesTables = Set<MultiplicationQuestion>()
var twoTimesTables = Set<MultiplicationQuestion>()
for i in 1...5 {
oneTimesTables.insert( MultiplicationQuestion(leftOperand: 1, rightOperand: i) )
twoTimesTables.insert( MultiplicationQuestion(leftOperand: 2, rightOperand: i) )
}
let commonQuestions = oneTimesTables.intersection(twoTimesTables)
let allQuestions = oneTimesTables.union(twoTimesTables)
The hoped-for result (wishful thinking) is that commonQuestions
contains one question (1 x 2), while allQuestions
contains nine questions, having removed the duplicate.
The actual result however is unpredictable. If I run the playground multiple times, I get different results. Most of the time, commonQuestions.count
is 0, but sometimes it is 1. And most of the time, allQuestions.count
is 10, but sometimes it is 9. (I'm not sure what I was expecting, but this inconsistency was certainly a surprise!)
How can I make the hash(into:)
method generate the same hash for two instances where the properties are the same but reversed?
This is how Hasher works
https://developer.apple.com/documentation/swift/hasher
However, the underlying hash algorithm is designed to exhibit
avalanche effects: slight changes to the seed or the input byte
sequence will typically produce drastic changes in the generated hash
value.
So the problem here in hash(into:) func
Since the sequence matters combine
operation is not commutative. You should find some other function to be a hash for this struct. In your case the best option is
func hash(into hasher: inout Hasher) {
hasher.combine(leftOperand & rightOperand)
}
As @Martin R pointed out to have less collisions it's better to use ^
func hash(into hasher: inout Hasher) {
hasher.combine(leftOperand ^ rightOperand)
}
Tiran Ut's answer (and comments) helped me a lot, and I've marked it as correct. Nonetheless, I thought it would be worth adding another answer to share some things I've learnt and to present another way of solving the problem.
Apple's hash(into:)
documentation says:
The components used for hashing must be the same as the components
compared in your type’s == operator implementation.
That's all well and good if its a simple one-to-one comparison of properties (like all the code examples show!) but what if your ==
method has conditional logic like mine? How do you translate that into a value (or values) to feed the hasher?
I was getting hung up on this detail, until Tiran suggested that feeding the hasher a constant value (like 2) would still work, since hash collisions are resolved by ==
anyway. You wouldn't do this in production of course, because you'd lose all the performance benefits of hash lookups, but the take-home for me was that if you can't make your hasher arguments exactly the same as your ==
operands, make the hasher equality logic more inclusive (not less).
The solutions in Tiran Ut's answer work because the bitwise operations don't care what order the operands are in, just like my ==
logic. Occasionally, two completely different pairs may generate the same value (resulting in a guaranteed hash collision), but the only real consequence in these cases is a small hit to performance.
In the end though, I realised that I could use the exact same logic in both cases, thereby avoiding hash collisions—well, aside from any caused by an imperfect hasher algorithm anyway. I added a new private constant to MultiplicationQuestion
and initialized it as follows:
uniqueOperands = Set([leftOperand, rightOperand])
A sorted array would have worked too, but a Set seemed like the more elegant choice. Since a Set has no ordering, my verbose conditional logic for ==
(using &&
and ||
) is already neatly encapsulated in the Set
type.
Now, I can just use the very same value to test for equality and to feed the hasher:
static func ==(lhs: MultiplicationQuestion, rhs: MultiplicationQuestion) -> Bool {
return lhs.uniqueOperands == rhs.uniqueOperands
}
func hash(into hasher: inout Hasher) {
hasher.combine(uniqueOperands)
}
I've tested performance and it's on par with the bitwise operations. Not only that, but my code has become more concise and readable in the process. Seems like a win-win.