The below code works on Visual Studio 2008 with and without optimization. But it only works on g++ without optimization (O0).
#include <cstdlib>
#include <iostream>
#include <cmath>
double round(double v, double digit)
{
double pow = std::pow(10.0, digit);
double t = v * pow;
//std::cout << "t:" << t << std::endl;
double r = std::floor(t + 0.5);
//std::cout << "r:" << r << std::endl;
return r / pow;
}
int main(int argc, char *argv[])
{
std::cout << round(4.45, 1) << std::endl;
std::cout << round(4.55, 1) << std::endl;
}
The output should be:
4.5
4.6
But g++ with optimization (O1
- O3
) will output:
4.5
4.5
If I add the volatile
keyword before t, it works, so might there be some kind of optimization bug?
Test on g++ 4.1.2, and 4.4.4.
Here is the result on ideone: http://ideone.com/Rz937
And the option I test on g++ is simple:
g++ -O2 round.cpp
The more interesting result, even I turn on /fp:fast
option on Visual Studio 2008, the result still is correct.
Further question:
I was wondering, should I always turn on the -ffloat-store
option?
Because the g++ version I tested is shipped with CentOS/Red Hat Linux 5 and CentOS/Redhat 6.
I compiled many of my programs under these platforms, and I am worried it will cause unexpected bugs inside my programs. It seems a little difficult to investigate all my C++ code and used libraries whether they have such problems. Any suggestion?
Is anyone interested in why even /fp:fast
turned on, Visual Studio 2008 still works? It seems like Visual Studio 2008 is more reliable at this problem than g++?
As Maxim Yegorushkin already noted in his answer, part of the problem is that internally your computer is using an 80 bit floating point representation. This is just part of the problem, though. The basis of the problem is that any number of the form n.nn5 does not have an exact binary floating representation. Those corner cases are always inexact numbers.
If you really want your rounding to be able to reliably round these corner cases, you need a rounding algorithm that addresses the fact that n.n5, n.nn5, or n.nnn5, etc. (but not n.5) is always inexact. Find the corner case that determines whether some input value rounds up or down and return the rounded-up or rounded-down value based on a comparison to this corner case. And you do need to take care that a optimizing compiler will not put that found corner case in an extended precision register.
See How does Excel successfully Rounds Floating numbers even though they are imprecise? for such an algorithm.
Or you can just live with the fact that the corner cases will sometimes round erroneously.
This implies that the problem is related to the debug statements. And it looks like there's a rounding error caused by loading the values into registers during the output statements, which is why others found that you can fix this with
-ffloat-store
To be flippant, there must be a reason that some programmers don't turn on
-ffloat-store
, otherwise the option wouldn't exist (likewise, there must be a reason that some programmers do turn on-ffloat-store
). I wouldn't recommend always turning it on or always turning it off. Turning it on prevents some optimizations, but turning it off allows for the kind of behavior you're getting.But, generally, there is some mismatch between binary floating point numbers (like the computer uses) and decimal floating point numbers (that people are familiar with), and that mismatch can cause similar behavior to what your getting (to be clear, the behavior you're getting is not caused by this mismatch, but similar behavior can be). The thing is, since you already have some vagueness when dealing with floating point, I can't say that
-ffloat-store
makes it any better or any worse.Instead, you may want to look into other solutions to the problem you're trying to solve (unfortunately, Koenig doesn't point to the actual paper, and I can't really find an obvious "canonical" place for it, so I'll have to send you to Google).
If you're not rounding for output purposes, I would probably look at
std::modf()
(incmath
) andstd::numeric_limits<double>::epsilon()
(inlimits
). Thinking over the originalround()
function, I believe it would be cleaner to replace the call tostd::floor(d + .5)
with a call to this function:I think that suggests the following improvement:
A simple note:
std::numeric_limits<T>::epsilon()
is defined as "the smallest number added to 1 that creates a number not equal to 1." You usually need to use a relative epsilon (i.e., scale epsilon somehow to account for the fact that you're working with numbers other than "1"). The sum ofd
,.5
andstd::numeric_limits<double>::epsilon()
should be near 1, so grouping that addition means thatstd::numeric_limits<double>::epsilon()
will be about the right size for what we're doing. If anything,std::numeric_limits<double>::epsilon()
will be too large (when the sum of all three is less than one) and may cause us to round some numbers up when we shouldn't.Nowadays, you should consider
std::nearbyint()
.Intel x86 processors use 80-bit extended precision internally, whereas
double
is normally 64-bit wide. Different optimization levels affect how often floating point values from CPU get saved into memory and thus rounded from 80-bit precision to 64-bit precision.Use the
-ffloat-store
gcc option to get the same floating point results with different optimization levels.Alternatively, use the
long double
type, which is normally 80-bit wide on gcc to avoid rounding from 80-bit to 64-bit precision.man gcc
says it all:The accepted answer is correct if you are compiling to an x86 target that doesn't include SSE2. All modern x86 processors support SSE2, so if you can take advantage of it, you should:
Let's break this down.
-mfpmath=sse -msse2
. This performs rounding by using SSE2 registers, which is much faster than storing every intermediate result to memory. Note that this is already the default on GCC for x86-64. From the GCC wiki:-ffp-contract=off
. Controlling rounding isn't enough for an exact match, however. FMA (fused multiply-add) instructions can change the rounding behavior versus its non-fused counterparts, so we need to disable it. This is the default on Clang, not GCC. As explained by this answer:By disabling FMA, we get results that exactly match on debug and release, at the cost of some performance (and accuracy). We can still take advantage of other performance benefits of SSE and AVX.
Personally, I have hit the same problem going the other way - from gcc to VS. In most instances I think it is better to avoid optimisation. The only time it is worthwhile is when you're dealing with numerical methods involving large arrays of floating point data. Even after disassembling I'm often underwhelmed by the compilers choices. Very often it's just easier to use compiler intrinsics or just write the assembly yourself.
I digged more into this problem and I can bring more precisions. First, the exact representations of 4.45 and 4.55 according to gcc on x84_64 are the following (with libquadmath to print the last precision):
As Maxim said above, the problem is due to the 80 bits size of the FPU registers.
But why is the problem never occuring on Windows? on IA-32, the x87 FPU was configured to use an internal precision for the mantissa of 53 bits (equivalent to a total size of 64 bits:
double
). For Linux and Mac OS, the default precision of 64 bits was used (equivalent to a total size of 80 bits:long double
). So the problem should be possible, or not, on these different platforms by changing the control word of the FPU (assuming the sequence of instructions would trigger the bug). The issue was reported to gcc as bug 323 (read at least the comment 92! ).To show the mantissa precision on Windows, you can compile this in 32 bits with VC++:
and on Linux/Cygwin:
Note that with gcc you can set the FPU precision with
-mpc32/64/80
, though it is ignored in Cygwin. But keep in mind that it will modify the size of the mantissa, but not the exponent one, letting the door open to other kinds of different behavior.On x86_64 architecture, SSE is used as said by tmandry, so the problem will not occur unless you force the old x87 FPU for FP computing with
-mfpmath=387
, or unless you compile in 32 bits mode with-m32
(you will need multilib package). I could reproduce the problem on Linux with different combinations of flags and versions of gcc:I tried a few combinations on Windows or Cygwin with VC++/gcc/tcc but the bug never showed up. I suppose the sequence of instruction generated is not the same.
Finally, note that an exotic way to prevent this problem with 4.45 or 4.55 would be to use
_Decimal32/64/128
, but support is really scarce... I spent a lot of time just to be able to do a printf withlibdfp
!