We are converting a C++ math library to C#. The library mixes the use of floats and doubles (casting between them sometimes) and we are trying to do the same, in order to get the exact same results in C# that we had in C++ but it is proving to be very difficult if not impossible.
I think the problem is one or more of the following, but I am not an expert:
Converting floats to double and double to floats is causing unpredictable results and done differently in C++ and C#
C++ and C# handle float precision differently, and they can't mimic each other
There is a setting somewhere in .NET to make it perform like C++, but I can't find it (both are 32-bit)
Can somebody explain to me the possible problems and maybe link me to some authoritative documentation from Microsoft I can use to help explain the situation and the reason for the differences?
EDIT
We are using VC6 and .NET4.0
I can't give examples of the calculations, because of an NDA, but I can show some numbers for the differences... probably very useless by themselves:
8.085004000000000 (C#) vs.
8.084980000000000 (C++)
8.848165000000000 (C#) vs.
8.848170000000000 (C++)
0.015263214111328 (C#) vs.
0.015263900756836 (C++)
It should be noted that these numbers include compounded problems. These are the results of calculations.
C++ allows the program to retain a higher precision for temporary results than the type of the subexpressions would imply. One thing that can happen is that intermediate expressions (or an unspecified subset of them) are computed as extended 80-bit floats.
I would be surprised on the other hand if this applied to C#, but even if it does, the C# compiler doesn't have to choose the same subset of expression to compute as 80-bit extended floats. EDIT: See Eric's comment below.
More details
Another instance of the same intermediate precision problem is when the compiler uses the fmadd
instruction for what is a multiplication followed by an addition in the source code (if the target architecture has it—for instance, PowerPC). The fmadd
instruction computes its intermediate result exactly, whereas a normal addition would round the intermediate result.
To prevent the C++ compiler from doing that, you should only need to write floating-point computations as three-address code using volatile variables for intermediate results. If this transformation changes the result of the C++ program, it means that the above problem was at play. But then you have changed the C++-side results. There is probably no way to get the exact same old C++ results in C# without reading the generated assembly.
If it's a little bit old, your C++ compiler may also optimize floating-point computations as if they were associative when they are not. There is not much you can do about that. It's just incorrect. The three-address code transformation would again prevent the compiler from applying it, but again there is no simple way to get the C# compiler to reproduce the old C++ results.