I am not sure if this non-standard way of stating a Stack Overflow question is good or bad, but here goes:
What is the best (mathematical or otherwise technical) explanation why the code:
static void Main()
{
decimal[] arr =
{
42m,
42.0m,
42.00m,
42.000m,
42.0000m,
42.00000m,
42.000000m,
42.0000000m,
42.00000000m,
42.000000000m,
42.0000000000m,
42.00000000000m,
42.000000000000m,
42.0000000000000m,
42.00000000000000m,
42.000000000000000m,
42.0000000000000000m,
42.00000000000000000m,
42.000000000000000000m,
42.0000000000000000000m,
42.00000000000000000000m,
42.000000000000000000000m,
42.0000000000000000000000m,
42.00000000000000000000000m,
42.000000000000000000000000m,
42.0000000000000000000000000m,
42.00000000000000000000000000m,
42.000000000000000000000000000m,
};
foreach (var m in arr)
{
Console.WriteLine(string.Format(CultureInfo.InvariantCulture,
"{0,-32}{1,-20:R}{2:X8}", m, (double)m, m.GetHashCode()
));
}
Console.WriteLine("Funny consequences:");
var h1 = new HashSet<decimal>(arr);
Console.WriteLine(h1.Count);
var h2 = new HashSet<double>(arr.Select(m => (double)m));
Console.WriteLine(h2.Count);
}
gives the following "funny" (apparently incorrect) output:
42 42 40450000
42.0 42 40450000
42.00 42 40450000
42.000 42 40450000
42.0000 42 40450000
42.00000 42 40450000
42.000000 42 40450000
42.0000000 42 40450000
42.00000000 42 40450000
42.000000000 42 40450000
42.0000000000 42 40450000
42.00000000000 42 40450000
42.000000000000 42 40450000
42.0000000000000 42 40450000
42.00000000000000 42 40450000
42.000000000000000 42 40450000
42.0000000000000000 42 40450000
42.00000000000000000 42 40450000
42.000000000000000000 42 40450000
42.0000000000000000000 42 40450000
42.00000000000000000000 42 40450000
42.000000000000000000000 41.999999999999993 BFBB000F
42.0000000000000000000000 42 40450000
42.00000000000000000000000 42.000000000000007 40450000
42.000000000000000000000000 42 40450000
42.0000000000000000000000000 42 40450000
42.00000000000000000000000000 42 40450000
42.000000000000000000000000000 42 40450000
Funny consequences:
2
3
Tried this under .NET 4.5.2.
In Decimal.cs
, we can see that GetHashCode()
is implemented as native code. Furthermore, we can see that the cast to double
is implemented as a call to ToDouble()
, which in turn is implemented as native code. So from there, we can't see a logical explanation for the behaviour.
In the old Shared Source CLI, we can find old implementations of these methods that hopefully sheds some light, if they haven't changed too much. We can find in comdecimal.cpp:
FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
WRAPPER_CONTRACT;
STATIC_CONTRACT_SO_TOLERANT;
ENSURE_OLEAUT32_LOADED();
_ASSERTE(d != NULL);
double dbl;
VarR8FromDec(d, &dbl);
if (dbl == 0.0) {
// Ensure 0 and -0 have the same hash code
return 0;
}
return ((int *)&dbl)[0] ^ ((int *)&dbl)[1];
}
FCIMPLEND
and
FCIMPL1(double, COMDecimal::ToDouble, DECIMAL d)
{
WRAPPER_CONTRACT;
STATIC_CONTRACT_SO_TOLERANT;
ENSURE_OLEAUT32_LOADED();
double result;
VarR8FromDec(&d, &result);
return result;
}
FCIMPLEND
We can see that the the GetHashCode()
implementation is based on the conversion to double
: the hash code is based on the bytes that result after a conversion to double
. It is based on the assumption that equal decimal
values convert to equal double
values.
So let's test the VarR8FromDec
system call outside of .NET:
In Delphi (I'm actually using FreePascal), here's a short program to call the system functions directly to test their behaviour:
{$MODE Delphi}
program Test;
uses
Windows,
SysUtils,
Variants;
type
Decimal = TVarData;
function VarDecFromStr(const strIn: WideString; lcid: LCID; dwFlags: ULONG): Decimal; safecall; external 'oleaut32.dll';
function VarDecAdd(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecSub(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecDiv(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarBstrFromDec(const decIn: Decimal; lcid: LCID; dwFlags: ULONG): WideString; safecall; external 'oleaut32.dll';
function VarR8FromDec(const decIn: Decimal): Double; safecall; external 'oleaut32.dll';
var
Zero, One, Ten, FortyTwo, Fraction: Decimal;
I: Integer;
begin
try
Zero := VarDecFromStr('0', 0, 0);
One := VarDecFromStr('1', 0, 0);
Ten := VarDecFromStr('10', 0, 0);
FortyTwo := VarDecFromStr('42', 0, 0);
Fraction := One;
for I := 1 to 40 do
begin
FortyTwo := VarDecSub(VarDecAdd(FortyTwo, Fraction), Fraction);
Fraction := VarDecDiv(Fraction, Ten);
Write(I: 2, ': ');
if VarR8FromDec(FortyTwo) = 42 then WriteLn('ok') else WriteLn('not ok');
end;
except on E: Exception do
WriteLn(E.Message);
end;
end.
Note that since Delphi and FreePascal have no language support for any floating-point decimal type, I'm calling system functions to perform the calculations. I'm setting FortyTwo
first to 42
. I then add 1
and subtract 1
. I then add 0.1
and subtract 0.1
. Et cetera. This causes the precision of the decimal to be extended the same way in .NET.
And here's (part of) the output:
...
20: ok
21: ok
22: not ok
23: ok
24: not ok
25: ok
26: ok
...
Thus showing that this is indeed a long-standing problem in Windows that merely happens to be exposed by .NET. It's system functions that are giving different results for equal decimal values, and either they should be fixed, or .NET should be changed to not use defective functions.
Now, in the new .NET Core, we can see in its decimal.cpp code to work around the problem:
FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
FCALL_CONTRACT;
ENSURE_OLEAUT32_LOADED();
_ASSERTE(d != NULL);
double dbl;
VarR8FromDec(d, &dbl);
if (dbl == 0.0) {
// Ensure 0 and -0 have the same hash code
return 0;
}
// conversion to double is lossy and produces rounding errors so we mask off the lowest 4 bits
//
// For example these two numerically equal decimals with different internal representations produce
// slightly different results when converted to double:
//
// decimal a = new decimal(new int[] { 0x76969696, 0x2fdd49fa, 0x409783ff, 0x00160000 });
// => (decimal)1999021.176470588235294117647000000000 => (double)1999021.176470588
// decimal b = new decimal(new int[] { 0x3f0f0f0f, 0x1e62edcc, 0x06758d33, 0x00150000 });
// => (decimal)1999021.176470588235294117647000000000 => (double)1999021.1764705882
//
return ((((int *)&dbl)[0]) & 0xFFFFFFF0) ^ ((int *)&dbl)[1];
}
FCIMPLEND
This appears to be implemented in the current .NET Framework too, based on the fact that one of the wrong double
values does give the same hash code, but it's not enough to completely fix the problem.
As for the difference in hashes it indeed seems to be wrong (same value, different hash) -> but it is answered already by LukeH in his comment.
As for the casting to double, though.. I see it that way:
42000000000000000000000
has different (and less 'precise') binary representation than 420000000000000000000000
and therefore you pay higher price for trying to round it.
Why it matters? Apparently decimal keeps track of its 'precision'. So for example it is storing 1m as 1*10^0
but its equivalent 1.000m as 1000*10^-3
. Most likely to be able to print it later as "1.000"
. Therefore when converting your decimal to double it's not 42 that you need to represent, but for example 420000000000000000 and this is far from optimal (mantissa and exponent are converted separately).
According to a simulator I have found (js one for Java, so not exactly what we may have for C# and therefore a bit different results, but meaningful):
42000000000000000000 ~ 1.1384122371673584 * 2^65 ~ 4.1999998e+19
420000000000000000000 = 1.4230153560638428 * 2^68 = 4.2e+20 (nice one)
4200000000000000000000 ~ 1.7787691354751587 * 2^71 ~ 4.1999999e+21
42000000000000000000000 ~ 1.111730694770813 * 2^75 ~ 4.1999998e+22
As you can see the value for 4.2E19 is less precise than for 4.2E20 and may end up being rounded to 4.19. If this is how the conversion to double happens then the result is not surprising. And since multiplying by 10, you'll usually encounter a number that is non-well-represented in binary, then we should expect such issues often.
Now to my mind its all the price for keeping trace of significant digits in decimal. If it was not important, we could always ex. normalize 4200*10^-2
to 4.2*10^1
(as double does it) and conversion to double wouldn't be that error-prone in context of hashcodes. If it's worth it? Not me to judge.
BTW: those 2 links provide nice reading about decimals binary representation:
https://msdn.microsoft.com/en-us/library/system.decimal.getbits.aspx
https://msdn.microsoft.com/en-us/library/system.decimal.aspx