Why is String.GetHashCode() implemented differentl

2019-02-08 05:49发布

问题:

What are the technical reasons behind the difference between the 32-bit and 64-bit versions of string.GetHashCode()?

More importantly, why does the 64-bit version seem to terminate its algorithm when it encounters the NUL character? For example, the following expressions all return true when run under the 64-bit CLR.

"\0123456789".GetHashCode() == "\0987654321".GetHashCode()
"\0AAAAAAAAA".GetHashCode() == "\0BBBBBBBBB".GetHashCode()
"\0The".GetHashCode() == "\0Game".GetHashCode()

This behavior (bug?) manifested as a performance issue when we used such strings as keys in a Dictionary.

回答1:

This looks like a known issue which Microsoft would not fix:

As you have mentioned this would be a breaking change for some programs (even though they shouldn't really be relying on this), the risk of this was deemed too high to fix this in the current release.

I agree that the rate of collisions that this will cause in the default Dictionary<String, Object> will be inflated by this. If this is adversely effecting your applications performance, I would suggest trying to work around it by using one of the Dictionary constructors that takes an IEqualityComparer so you can provide a more appropriate GetHashCode implementation. I know this isn't ideal and would like to get this fixed in a future version of the .NET Framework.

Source: Microsoft Connect - String.GetHashCode ignores any characters in the string beyond the first null byte in x64 runtime



回答2:

Eric lippert has got a wondeful blog to this Curious property in String

Curious property Revealed



标签: c# hash clr