gcc precision bug?

2020-03-15 03:57发布

I can only assume this is a bug. The first assert passes while the second fails:

double sum_1 =  4.0 + 6.3;
assert(sum_1 == 4.0 + 6.3);

double t1 = 4.0, t2 = 6.3;

double sum_2 =  t1 + t2;
assert(sum_2 == t1 + t2);

If not a bug, why?

标签: gcc precision
7条回答
太酷不给撩
2楼-- · 2020-03-15 04:35

You are comparing floating point numbers. Don't do that, floating point numbers have inherent precision error in some circumstances. Instead, take the absolute value of the difference of the two values and assert that the value is less than some small number (epsilon).

void CompareFloats( double d1, double d2, double epsilon )
{
    assert( abs( d1 - d2 ) < epsilon );
} 

This has nothing to do with the compiler and everything to do with the way floating point numbers are implemented. here is the IEEE spec:

http://www.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

查看更多
我命由我不由天
3楼-- · 2020-03-15 04:35

When comparing floating point numbers for closeness you usually want to measure their relative difference, which is defined as

if (abs(x) != 0 || abs(y) != 0) 
   rel_diff (x, y) = abs((x - y) / max(abs(x),abs(y))
else
   rel_diff(x,y) = max(abs(x),abs(y))

For example,

rel_diff(1.12345, 1.12367)   = 0.000195787019
rel_diff(112345.0, 112367.0) = 0.000195787019
rel_diff(112345E100, 112367E100) = 0.000195787019

The idea is to measure the number of leading significant digits the numbers have in common; if you take the -log10 of 0.000195787019 you get 3.70821611, which is about the number of leading base 10 digits all the examples have in common.

If you need to determine if two floating point numbers are equal you should do something like

if (rel_diff(x,y) < error_factor * machine_epsilon()) then
  print "equal\n";

where machine epsilon is the smallest number that can be held in the mantissa of the floating point hardware being used. Most computer languages have a function call to get this value. error_factor should be based on the number of significant digits you think will be consumed by rounding errors (and others) in the calculations of the numbers x and y. For example, if I knew that x and y were the result of about 1000 summations and did not know any bounds on the numbers being summed, I would set error_factor to about 100.

Tried to add these as links but couldn't since this is my first post:

  • en.wikipedia.org/wiki/Relative_difference
  • en.wikipedia.org/wiki/Machine_epsilon
  • en.wikipedia.org/wiki/Significand (mantissa)
  • en.wikipedia.org/wiki/Rounding_error
查看更多
一夜七次
4楼-- · 2020-03-15 04:38

Comparisons of double precision numbers are inherently inaccurate. For instance, you can often find 0.0 == 0.0 returning false. This is due to the way the FPU stores and tracks numbers.

Wikipedia says:

Testing for equality is problematic. Two computational sequences that are mathematically equal may well produce different floating-point values.

You will need to use a delta to give a tolerance for your comparisons, rather than an exact value.

查看更多
Lonely孤独者°
5楼-- · 2020-03-15 04:39

It may be that in one of the cases, you end up comparing a 64-bit double to an 80-bit internal register. It may be enlightening to look at the assembly instructions GCC emits for the two cases...

查看更多
我命由我不由天
6楼-- · 2020-03-15 04:45

This "problem" can be "fixed" by using these options:

-msse2 -mfpmath=sse

as explained on this page:

http://www.network-theory.co.uk/docs/gccintro/gccintro_70.html

Once I used these options, both asserts passed.

查看更多
【Aperson】
7楼-- · 2020-03-15 04:48

I've duplicated your problem on my Intel Core 2 Duo, and I looked at the assembly code. Here's what's happening: when your compiler evaluates t1 + t2, it does

load t1 into an 80-bit register
load t2 into an 80-bit register
compute the 80-bit sum

When it stores into sum_2 it does

round the 80-bit sum to a 64-bit number and store it

Then the == comparison compares the 80-bit sum to a 64-bit sum, and they're different, primarily because the fractional part 0.3 cannot be represented exactly using a binary floating-point number, so you are comparing a 'repeating decimal' (actually repeating binary) that has been truncated to two different lengths.

What's really irritating is that if you compiler with gcc -O1 or gcc -O2, gcc does the wrong arithmetic at compile time, and the problem goes away. Maybe this is OK according to the standard, but it's just one more reason that gcc is not my favorite compiler.


P.S. When I say that == compares an 80-bit sum with a 64-bit sum, of course I really mean it compares the extended version of the 64-bit sum. You might do well to think

sum_2 == t1 + t2

resolves to

extend(sum_2) == extend(t1) + extend(t2)

and

sum_2 = t1 + t2

resolves to

sum_2 = round(extend(t1) + extend(t2))

Welcome to the wonderful world of floating point!

查看更多
登录 后发表回答