string.GetHashCode() returns different values in d

2019-01-26 13:31发布

问题:

To my surprise the folowing method produces a different result in debug vs release:

int result = "test".GetHashCode();

Is there any way to avoid this?

I need a reliable way to hash a string and I need the value to be consistent in debug and release mode. I would like to avoid writing my own hashing function if possible.

Why does this happen?

FYI, reflector gives me:

[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail), SecuritySafeCritical]
public override unsafe int GetHashCode()
{
    fixed (char* str = ((char*) this))
    {
        char* chPtr = str;
        int num = 0x15051505;
        int num2 = num;
        int* numPtr = (int*) chPtr;
        for (int i = this.Length; i > 0; i -= 4)
        {
            num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
            if (i <= 2)
            {
                break;
            }
            num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
            numPtr += 2;
        }
        return (num + (num2 * 0x5d588b65));
    }
}

回答1:

GetHashCode() is not what you should be using to hash a string, almost 100% of the time. Without knowing what you're doing, I recommend that you use an actual hash algorithm, like SHA-1:

using(System.Security.Cryptography.SHA1Managed hp = new System.Security.Cryptography.SHA1Managed()) {
    // Use hp.ComputeHash(System.Text.Encoding.ASCII (or Unicode, UTF8, UTF16, or UTF32 or something...).GetBytes(theString) to compute the hash code.
}

Update: For something a little bit faster, there's also SHA1Cng, which is significantly faster than SHA1Managed.



回答2:

Here's a better approach that is much faster than SHA and you can replace the modified GetHasCode with it: C# fast hash murmur2

There are several implementations with different levels of "unmanaged" code, so if you need fully managed it's there and if you can use unsafe it's there too.



回答3:

    /// <summary>
    /// Default implementation of string.GetHashCode is not consistent on different platforms (x32/x64 which is our case) and frameworks. 
    /// FNV-1a - (Fowler/Noll/Vo) is a fast, consistent, non-cryptographic hash algorithm with good dispersion. (see http://isthe.com/chongo/tech/comp/fnv/#FNV-1a)
    /// </summary>
    private static int GetFNV1aHashCode(string str)
    {
        if (str == null)
            return 0;
        var length = str.Length;
        // original FNV-1a has 32 bit offset_basis = 2166136261 but length gives a bit better dispersion (2%) for our case where all the strings are equal length, for example: "3EC0FFFF01ECD9C4001B01E2A707"
        int hash = length;
        for (int i = 0; i != length; ++i)
            hash = (hash ^ str[i]) * 16777619;
        return hash;
    }

I guess this implementation is slower than the unsafe one posted here. But it's much simpler and safe. Works good in case super speed is not needed.