Is Dictionary broken or should GetHashCode() only

2020-03-10 05:23发布

问题:

When an object is added to the .NET System.Collections.Generic.Dictionary class the hashcode of the key is stored internally and used for later comparisons. When the hashcode changes after its initial insertion into the dictionary it often becomes "inaccessible" and may surprise its users when an existence check, even using the same reference, returns false (sample code below).

The GetHashCode documentation says:

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method.

So, according to the GetHashCode docs, the hashcode may change whenever equality-determining state is changed, yet the Dictionary implementation does not support this.

Is the current .NET dictionary implementation broken in that it incorrectly ignore the hashcode allowances? Should GetHashCode() only be based on immutable members? Or, is there something else to break a possible false dichotomy?

class Hashable
{
    public int PK { get; set; }

    public override int GetHashCode()
    {
        if (PK != 0) return PK.GetHashCode();
        return base.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        return Equals(obj as Hashable);
    }

    public virtual bool Equals(Hashable other)
    {
        if (other == null) return false;
        else if (ReferenceEquals(this, other)) return true;
        else if (PK != 0 && other.PK != 0) return Equals(PK, other.PK);
        return false;
    }

    public override string ToString()
    {
        return string.Format("Hashable {0}", PK);
    }
}

class Test
{
    static void Main(string[] args)
    {
        var dict = new Dictionary<Hashable, bool>();
        var h = new Hashable();
        dict.Add(h, true);

        h.PK = 42;
        if (!dict.ContainsKey(h)) // returns false, despite same reference
            dict.Add(h, false);
    }
}

回答1:

No, you just shouldn't mutate a key (in a material way) after inserting it into a dictionary. This is by design, and the way that every hash table I've ever used works. The docs even specify this:

As long as an object is used as a key in the Dictionary<TKey, TValue>, it must not change in any way that affects its hash value. Every key in a Dictionary<TKey, TValue> must be unique according to the dictionary's equality comparer. A key cannot be null, but a value can be, if the value type TValue is a reference type.

So it's only going to surprise users who don't read documentation :)



回答2:

To add to Jon's answer, I would just add emphasis to a certain part of the docs that you quoted:

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method.

Now, right there you've broken the rules. You changed PK, which doesn't affect the outcome of Equals (because you have that ReferenceEquals check in there), but the result of your GetHashCode does change. So that's the simple answer.

Taking the more conceptual approach, I think you can look at it like this: if you have overridden the Equals and GetHashCode behavior for your type, then you've taken ownership of the concept of what it means for one instance of this type to be equal to another. And in fact you have defined it in a way that a Hashable object can be changed into something completely different; i.e., something that is no longer usable in the same way it was previously (because its hash code has changed).

Considered from this perspective, after you do dict.Add(h, true), and then you change h.PK, the dictionary doesn't contain the object referenced by h anymore. It contains something else (which doesn't really exist anywhere). It's kind of like the type you've defined is a snake that has shed its skin.