C# Why can equal decimals produce unequal hash val

2019-01-16 23:30发布

站内文章 / C#

38 0

闹够了就滚

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

We ran into a magic decimal number that broke our hashtable. I boiled it down to the following minimal case:

decimal d0 = 295.50000000000000000000000000m;
decimal d1 = 295.5m;

Console.WriteLine("{0} == {1} : {2}", d0, d1, (d0 == d1));
Console.WriteLine("0x{0:X8} == 0x{1:X8} : {2}", d0.GetHashCode(), d1.GetHashCode()
                  , (d0.GetHashCode() == d1.GetHashCode()));

Giving the following output:

295.50000000000000000000000000 == 295.5 : True
0xBF8D880F == 0x40727800 : False

What is really peculiar: change, add or remove any of the digits in d0 and the problem goes away. Even adding or removing one of the trailing zeros! The sign doesn't seem to matter though.

Our fix is to divide the value to get rid of the trailing zeroes, like so:

decimal d0 = 295.50000000000000000000000000m / 1.000000000000000000000000000000000m;

But my question is, how is C# doing this wrong?

回答1:

To start with, C# isn't doing anything wrong at all. This is a framework bug.

It does indeed look like a bug though - basically whatever normalization is involved in comparing for equality ought to be used in the same way for hash code computation. I've checked and can reproduce it too (using .NET 4) including checking the Equals(decimal) and Equals(object) methods as well as the == operator.

It definitely looks like it's the d0 value which is the problem, as adding trailing 0s to d1 doesn't change the results (until it's the same as d0 of course). I suspect there's some corner case tripped by the exact bit representation there.

I'm surprised it isn't (and as you say, it works most of the time), but you should report the bug on Connect.

回答2:

Another bug (?) that results in different bytes representation for the same decimal on different compilers: Try to compile following code on VS 2005 and then VS 2010. Or look at my article on Code Project.

class Program
{
    static void Main(string[] args)
    {
        decimal one = 1m;

        PrintBytes(one);
        PrintBytes(one + 0.0m); // compare this on different compilers!
        PrintBytes(1m + 0.0m);

        Console.ReadKey();
    }

    public static void PrintBytes(decimal d)
    {
        MemoryStream memoryStream = new MemoryStream();
        BinaryWriter binaryWriter = new BinaryWriter(memoryStream);

        binaryWriter.Write(d);

        byte[] decimalBytes = memoryStream.ToArray();

        Console.WriteLine(BitConverter.ToString(decimalBytes) + " (" + d + ")");
    }
}

Some people use following normalization code d=d+0.0000m which is not working properly on VS 2010. Your normalization code (d=d/1.000000000000000000000000000000000m) looks good - I use the same one to get the same byte array for the same decimals.

回答3:

Ran into this bug too ... :-(

Tests (see below) indicate that this depends on the maximum precision available for the value. The wrong hash codes only occur near the maximum precision for the given value. As the tests show the error seems to depend on the digits left of the decimal point. Sometimes the only the hashcode for maxDecimalDigits - 1 is wrong, sometimes the value for maxDecimalDigits is wrong.

var data = new decimal[] {
//    123456789012345678901234567890
    1.0m,
    1.00m,
    1.000m,
    1.0000m,
    1.00000m,
    1.000000m,
    1.0000000m,
    1.00000000m,
    1.000000000m,
    1.0000000000m,
    1.00000000000m,
    1.000000000000m,
    1.0000000000000m,
    1.00000000000000m,
    1.000000000000000m,
    1.0000000000000000m,
    1.00000000000000000m,
    1.000000000000000000m,
    1.0000000000000000000m,
    1.00000000000000000000m,
    1.000000000000000000000m,
    1.0000000000000000000000m,
    1.00000000000000000000000m,
    1.000000000000000000000000m,
    1.0000000000000000000000000m,
    1.00000000000000000000000000m,
    1.000000000000000000000000000m,
    1.0000000000000000000000000000m,
    1.00000000000000000000000000000m,
    1.000000000000000000000000000000m,
    1.0000000000000000000000000000000m,
    1.00000000000000000000000000000000m,
    1.000000000000000000000000000000000m,
    1.0000000000000000000000000000000000m,
};

for (int i = 0; i < 1000; ++i)
{
    var d0 = i * data[0];
    var d0Hash = d0.GetHashCode();
    foreach (var d in data)
    {
        var value = i * d;
        var hash = value.GetHashCode();
        Console.WriteLine("{0};{1};{2};{3};{4};{5}", d0, value, (d0 == value), d0Hash, hash, d0Hash == hash);
    }
}

回答4:

This is a decimal rounding error.

Too much precision is required to set d0 with the .000000000000000, as a consequence the algorithm in charge of it makes a mistake and ends up giving a different result. It could be classified as a bug in this example, although note that "decimal" type is supposed to have a precision of 28 digits, and here, you are actually requiring a precision of 29 digits for d0.

This can be tested by asking for the full raw hexadecimal representation of d0 and d1.

回答5:

I tested this in VB.NET (v3.5) and got the same thing.

The interesting thing about the hash codes :

A) 0x40727800 = 1081243648

B) 0xBF8D880F = -1081243648

Using Decimal.GetBits() I found

format : Mantissa (hhhhhhhh hhhhhhhh hhhhhhhh) Exponent(seee0000) (h is values, 's' is sign, 'e' is exponent, 0 must be zeros)

d1 ==> 00000000 00000000 00000B8B - 00010000 = (2955 / 10 ^ 1) = 295.5

do ==> 5F7B2FE5 D8EACD6E 2E000000 - 001A0000

...which converts to 29550000000000000000000000000 / 10^26 = 295.5000000...etc

** edit : ok, I wrote a 128-bit hex-decimal calculator and the above is exactly correct

It definitely looks like an internal conversion bug of some sort. Microsoft explicitly states that they do not guarantee their default implementation of GetHashCode. If you are using it for anything important then it probably makes sense to write your own GetHashCode for the decimal type. Formatting it to a fixed decimal, fixed width string and hashing seems to work, for example (>29 decimal places, > 58 width - fits all possible decimals).

* edit : I don't know about this anymore. It still must be a conversion error somewhere since the stored precision fundamentally changes the real value in memory. That the hash codes end up as signed negatives of each other is a big clue - would need to look further into the default hash code implementation to find more.

28 or 29 digits shouldn't matter unless there is dependent code which does not evaluate the outer extents properly. The largest 96-bit integer accessible is :

79228162514264337593543950335

so you can have 29 digits so long as the whole thing (without decimal point) is less than this value. I can't help but think that this is something much more subtle in the hash code calculation somewhere.

回答6:

The documetation suggests that because of GetHashCode() being unpredictable, you should create your own. It's considered unpredictable because each Type has it's own implementation and since we don't know the internals of it we should create our own according to how we evaluate uniqueness.

However, I think the answer is that GetHashCode() is not using the mathematical decimal value to create the hash code.

Mathematically we see 295.50000000 and 295.5 as being the same. When you look at the decimal objects in the IDE this is true too. However, if you do a ToString() on both decimals you will see that the compiler sees them differently, i.e. you will still see 295.50000000. GetHashCode() is evidently not using the mathematical representation of the decimal for creating the hash code.

Your fix is simply creating a new decimal without all the trailing zeros which is why it works.