Why are these hashcodes equals?

2019-05-14 23:32发布

问题:

This test is failing :

var hashCode = new 
{
    CustomerId = 3354,
    ServiceId = 3,
    CmsThematicId = (int?)605,
    StartDate = (DateTime?)new DateTime(2013, 1, 5),
    EndDate = (DateTime?)new DateTime(2013, 1, 6)
}.GetHashCode();
var hashCode2 = new
{
    CustomerId = 1210,
    ServiceId = 3,
    CmsThematicId = (int?)591,
    StartDate = (DateTime?)new DateTime(2013, 3, 31),
    EndDate = (DateTime?)new DateTime(2013, 4, 1)
}.GetHashCode();
Assert.AreNotEqual(hashCode, hashCode2);

Can you tell me why ?

回答1:

It's kinda amazing you found this coincidence.

Anonymous classes have a generated GetHashCode() method that generates a hash code by combining the hash codes of all properties.

The calculation is basically this:

  public override int GetHashCode()
  {
    return        -1521134295 * 
                ( -1521134295 * 
                ( -1521134295 * 
                ( -1521134295 * 
                ( -1521134295 * 
                   1170354300 + 
                  CustomerId.GetHashCode()) +
                  ServiceId.GetHashCode()) + 
                  CmsThematicId.GetHashCode()) + 
                  StartDate.GetHashCode()) + 
                  EndDate.GetHashCode();
  }

If you change any of the values of any of the fields, the hash code does change. The fact that you found two different sets of values that happen to get the same hash codes is a coincidence.

Note that hash codes are not necessarily unique. It's impossible to say hash codes would always be unique since there can be more objects than hash codes (although that is a lot of objects). Good hash codes provide a random distribution of values.

NOTE: The above is from .NET 4. Different versions of .NET may different and Mono differs.

If you want to actually compare two objects for equality then use .Equals(). For anonymous objects it compares each field. An even better option is to use an NUnit constraint that compares each field and reports back which field differs. I posted a constraint here:

https://stackoverflow.com/a/2046566/118703



回答2:

Did you run into this when processing a fairly large amount of data?

Welcome to the wonderful world of hash codes. A hash code is not a "unique identifier." It can't be. There is an essentially infinite number of possible different instances of that anonymous type, but only 2^32 possible hash codes. So it's guaranteed that if you create enough of those objects, you're going to see some duplicates. In fact, if you generate 70,000 of those objects randomly, the odds are better than 50% that two of them will have the same hash code.

See Birthdays, Random Numbers, and Hash Codes, and the linked Wikipedia article for more info.

As for why some people didn't see a duplicate and others did, it's likely that they ran the program on different versions of .NET. The algorithm for generating hash codes is not guaranteed to remain the same across versions or platforms:

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.



回答3:

Your test is not valid.

Because hash codes are not guaranteed to be unique (see other answers for a good explanation), you should not test for uniqueness of hash codes.

When writing your own GetHashCode() method, it is a good idea to test for even distribution of random input, just not for uniqueness. Just make sure that you use enough random input to get a good test.

The MSDN spec on GetHashCode specifically states:

For the best performance, a hash function must generate a random distribution for all input.

This is all relative, of course. A GetHashCode() method that is being used to put 100 objects in a dictionary doesn't need to be nearly as random as a GetHashCode() that puts 10,000,000 objects in a dictionary.



回答4:

Jim suggested me (in the chat room) to store my parameters so when i display my parameters, select the not used ones, then when someone registers I flag it as used. But it's a big PITA to generate all the parameters.

So my solution is to build a int64 hashcode like this

const long i = -1521134295;    
return -i * (-i * (-i * (-i * -117147284 + customerId.GetHashCode()) + serviceId.GetHashCode()) + cmsThematicId.GetHashCode()) + startDate.GetHashCode();

I removed the end date because Its value was depending on serviceId and startDate so I shouldn't have add this to the hashcode in the firstplace. I copy/pasted it from a decompilation of the generated class. I got not colision if I do a test with 300 000 differents combinations.