Hashable struct with interchangeable properties?

2019-07-24 06:44发布

问题:

I need to make a custom struct conform to Hashable so that I can use it as a Dictionary key type. The challenge though, is that two of the struct's properties are interchangeable for the purpose of identifying a unique instance.

Here's a simplified example to illustrate the problem:

struct MultiplicationQuestion {
    let leftOperand: Int
    let rightOperand: Int
    var answer: Int { return leftOperand * rightOperand }
}

The two important properties for identifying a unique MultiplicationQuestion are leftOperand and rightOperand, but it doesn't matter what order they are in, because '1 x 2' is essentially the same question as '2 x 1'. (For reasons I won't go into here, they need to be kept as separate properties.)

I tried defining Hashable conformance as follows, knowing that there is a conflict between equality as I've defined it for == and what the built-in Hasher is going to do:

extension MultiplicationQuestion: Hashable {
    static func == (lhs: MultiplicationQuestion, rhs: MultiplicationQuestion) -> Bool {
        return (lhs.leftOperand == rhs.leftOperand && lhs.rightOperand == rhs.rightOperand) || (lhs.leftOperand == rhs.rightOperand && lhs.rightOperand == rhs.leftOperand)
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(leftOperand)
        hasher.combine(rightOperand)
    }
}

I tested this by creating two Sets of questions and performing various operations on them:

var oneTimesTables = Set<MultiplicationQuestion>()
var twoTimesTables = Set<MultiplicationQuestion>()
for i in 1...5 {
    oneTimesTables.insert( MultiplicationQuestion(leftOperand: 1, rightOperand: i) )
    twoTimesTables.insert( MultiplicationQuestion(leftOperand: 2, rightOperand: i) )
}

let commonQuestions = oneTimesTables.intersection(twoTimesTables)
let allQuestions = oneTimesTables.union(twoTimesTables)

The hoped-for result (wishful thinking) is that commonQuestions contains one question (1 x 2), while allQuestions contains nine questions, having removed the duplicate.

The actual result however is unpredictable. If I run the playground multiple times, I get different results. Most of the time, commonQuestions.count is 0, but sometimes it is 1. And most of the time, allQuestions.count is 10, but sometimes it is 9. (I'm not sure what I was expecting, but this inconsistency was certainly a surprise!)

How can I make the hash(into:) method generate the same hash for two instances where the properties are the same but reversed?

回答1:

This is how Hasher works

https://developer.apple.com/documentation/swift/hasher

However, the underlying hash algorithm is designed to exhibit avalanche effects: slight changes to the seed or the input byte sequence will typically produce drastic changes in the generated hash value.

So the problem here in hash(into:) func

Since the sequence matters combine operation is not commutative. You should find some other function to be a hash for this struct. In your case the best option is

    func hash(into hasher: inout Hasher) {
        hasher.combine(leftOperand & rightOperand)
    }

As @Martin R pointed out to have less collisions it's better to use ^

    func hash(into hasher: inout Hasher) {
        hasher.combine(leftOperand ^ rightOperand)
    }


回答2:

Tiran Ut's answer (and comments) helped me a lot, and I've marked it as correct. Nonetheless, I thought it would be worth adding another answer to share some things I've learnt and to present another way of solving the problem.

Apple's hash(into:) documentation says:

The components used for hashing must be the same as the components compared in your type’s == operator implementation.

That's all well and good if its a simple one-to-one comparison of properties (like all the code examples show!) but what if your == method has conditional logic like mine? How do you translate that into a value (or values) to feed the hasher?

I was getting hung up on this detail, until Tiran suggested that feeding the hasher a constant value (like 2) would still work, since hash collisions are resolved by == anyway. You wouldn't do this in production of course, because you'd lose all the performance benefits of hash lookups, but the take-home for me was that if you can't make your hasher arguments exactly the same as your == operands, make the hasher equality logic more inclusive (not less).

The solutions in Tiran Ut's answer work because the bitwise operations don't care what order the operands are in, just like my == logic. Occasionally, two completely different pairs may generate the same value (resulting in a guaranteed hash collision), but the only real consequence in these cases is a small hit to performance.

In the end though, I realised that I could use the exact same logic in both cases, thereby avoiding hash collisions—well, aside from any caused by an imperfect hasher algorithm anyway. I added a new private constant to MultiplicationQuestion and initialized it as follows:

uniqueOperands = Set([leftOperand, rightOperand])

A sorted array would have worked too, but a Set seemed like the more elegant choice. Since a Set has no ordering, my verbose conditional logic for == (using && and ||) is already neatly encapsulated in the Set type.

Now, I can just use the very same value to test for equality and to feed the hasher:

static func ==(lhs: MultiplicationQuestion, rhs: MultiplicationQuestion) -> Bool {
    return lhs.uniqueOperands == rhs.uniqueOperands
}

func hash(into hasher: inout Hasher) {
    hasher.combine(uniqueOperands)
}

I've tested performance and it's on par with the bitwise operations. Not only that, but my code has become more concise and readable in the process. Seems like a win-win.