Should the hash code of null always be zero, in .N

2019-01-17 04:59发布

Given that collections like System.Collections.Generic.HashSet<> accept null as a set member, one can ask what the hash code of null should be. It looks like the framework uses 0:

// nullable struct type
int? i = null;
i.GetHashCode();  // gives 0
EqualityComparer<int?>.Default.GetHashCode(i);  // gives 0

// class type
CultureInfo c = null;
EqualityComparer<CultureInfo>.Default.GetHashCode(c);  // gives 0

This can be (a little) problematic with nullable enums. If we define

enum Season
{
  Spring,
  Summer,
  Autumn,
  Winter,
}

then the Nullable<Season> (also called Season?) can take just five values, but two of them, namely null and Season.Spring, have the same hash code.

It is tempting to write a "better" equality comparer like this:

class NewNullEnumEqComp<T> : EqualityComparer<T?> where T : struct
{
  public override bool Equals(T? x, T? y)
  {
    return Default.Equals(x, y);
  }
  public override int GetHashCode(T? x)
  {
    return x.HasValue ? Default.GetHashCode(x) : -1;
  }
}

But is there any reason why the hash code of null should be 0?

EDIT/ADDITION:

Some people seem to think this is about overriding Object.GetHashCode(). It really is not, actually. (The authors of .NET did make an override of GetHashCode() in the Nullable<> struct which is relevant, though.) A user-written implementation of the parameterless GetHashCode() can never handle the situation where the object whose hash code we seek is null.

This is about implementing the abstract method EqualityComparer<T>.GetHashCode(T) or otherwise implementing the interface method IEqualityComparer<T>.GetHashCode(T). Now, while creating these links to MSDN, I see that it says there that these methods throw an ArgumentNullException if their sole argument is null. This must certainly be a mistake on MSDN? None of .NET's own implementations throw exceptions. Throwing in that case would effectively break any attempt to add null to a HashSet<>. Unless HashSet<> does something extraordinary when dealing with a null item (I will have to test that).

NEW EDIT/ADDITION:

Now I tried debugging. With HashSet<>, I can confirm that with the default equality comparer, the values Season.Spring and null will end in the same bucket. This can be determined by very carefully inspecting the private array members m_buckets and m_slots. Note that the indices are always, by design, offset by one.

The code I gave above does not, however, fix this. As it turns out, HashSet<> will never even ask the equality comparer when the value is null. This is from the source code of HashSet<>:

    // Workaround Comparers that throw ArgumentNullException for GetHashCode(null).
    private int InternalGetHashCode(T item) {
        if (item == null) { 
            return 0;
        } 
        return m_comparer.GetHashCode(item) & Lower31BitMask; 
    }

This means that, at least for HashSet<>, it is not even possible to change the hash of null. Instead, a solution is to change the hash of all the other values, like this:

class NewerNullEnumEqComp<T> : EqualityComparer<T?> where T : struct
{
  public override bool Equals(T? x, T? y)
  {
    return Default.Equals(x, y);
  }
  public override int GetHashCode(T? x)
  {
    return x.HasValue ? 1 + Default.GetHashCode(x) : /* not seen by HashSet: */ 0;
  }
}

标签: c# .net hash null
8条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-01-17 06:02

Good question.

I just tried to code this:

enum Season
{
  Spring,
  Summer,
  Autumn,
  Winter,
}

and execute this like this:

Season? v = null;
Console.WriteLine(v);

it returns null

if I do, instead normal

Season? v = Season.Spring;
Console.WriteLine((int)v);

it return 0, as expected, or simple Spring if we avoid casting to int.

So.. if you do the following:

Season? v = Season.Spring;  
Season? vnull = null;   
if(vnull == v) // never TRUE

EDIT

From MSDN

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values

In other words: if two objects have same hash code that doesn't mean that they are equal, cause real equality is determined by Equals.

From MSDN again:

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

查看更多
【Aperson】
3楼-- · 2019-01-17 06:05

It is 0 for the sake of simplicity. There is no such hard requirement. You only need to ensure the general requirements of hash coding.

For example, you need to make sure that if two objects are equal, their hashcodes must always be equal too. Therefore, different hashcodes must always represent different objects (but it's not necessarily true vice versa: two different objects may have the same hashcode, even though if this happens often then this is not a good quality hash function -- it doesn't have a good collision resistance).

Of course, I restricted my answer to requirements of mathematical nature. There are .NET-specific, technical conditions as well, which you can read here. 0 for a null value is not among them.

查看更多
登录 后发表回答