Why does adding 0 to the end of float literal chan

2019-01-11 17:26发布

问题:

I discovered on my x86 VM (32 bit) that the following program:

#include <stdio.h>
void foo (long double x) {
    int y = x;
    printf("(int)%Lf = %d\n", x, y);
}
int main () {
    foo(.9999999999999999999728949456878623891498136799780L);
    foo(.999999999999999999972894945687862389149813679978L);
    return 0;
}

Produces the following output:

(int)1.000000 = 1
(int)1.000000 = 0

Ideone also produces this behavior.

What is the compiler doing to allow this to happen?

I found this constant as I was tracking down why the following program didn't produce 0 as I expected (using 19 9s produced the 0 I expected):

int main () {
    long double x = .99999999999999999999L; /* 20 9's */
    int y = x;
    printf("%d\n", y);
    return 0;
}

As I tried to compute the value at which the result switches from expected to unexpected, I arrived at the constant this question is about.

回答1:

Your problem is that long double on your platform has insufficient precision to store the exact value 0.99999999999999999999. This means that the value of that must be converted to a representable value (this conversion happens during translation of your program, not at runtime).

This conversion can generate either the nearest representable value, or the next greater or smaller representable value. The choice is implementation-defined, so your implementation should document which it is using. It seems that your implementation uses x87-style 80bit long double, and is rounding to the nearest value, resulting in a value of 1.0 stored in x.


With the assumed format for long double (with 64 mantissa bits), the highest representable number less than 1.0 is, in hexadecimal:

0x0.ffffffffffffffff

The number exactly halfway between this value and the next higher representable number (1.0) is:

0x0.ffffffffffffffff8

Your very long constant 0.9999999999999999999728949456878623891498136799780 is equal to:

0x0.ffffffffffffffff7fffffffffffffffffffffffa1eb2f0b64cf31c113a8ec...

which should obviously be rounded down if rounding to nearest, but you appear to have reached some limit of the floating point representation your compiler is using, or a rounding bug.



回答2:

Compiler uses binary numbers. Most compilers do the same thing.

According to wolframalpha, binary representation of

0.99999999999999999999

looks like this:

0.11111111111111111111111111111111111111111111111111111111111111111101000011000110101111011110011011011011011110111011100101000101010111011100001011010001001110001101011001010000110000101001111011111001111110000101010111111110100110000010001001101011001101010110110010010101101111101001110001100111101100000000100110110001100110000011000100100011000011110100001000000100001000101000111011010111111101011010010000010110011111110100100110001011001110100011100001111101011110101001000000111110010000101101001001010110010011001110111111100111101111100000111010001101101011000100110001010010001000100010110000101110100101010101001010100010001001100111111111001001101100000000010010001011110100101011101001001101001111001001000101011101001100111101110111111001101110100111000001111101101101101101110100100111101000000000111101101101001000111101100010101110011101110001110010110110111101000011110110100011000110101100011111111110111000010010001111000000000101100101000100101110100001001101000010110101000100011100000110010001110101...

That's 932 bits, and that STILL isn't enough to precisely represent your number (see dots at the end).

Which means that as long as your underlying platform uses base of 2 to store numbers, you will not be able to store exactly 0.99999999999999999999.

Because number cannot be stored precisely, it'll be rounded up or down. With 20 9s it ends up being rounded up, and with 19 9s it ends up being rounded down.

To avoid this problem, instead of doubles you'll need to use some kind of 3rd party mathematics/bignum library that stores numbers internally using decimal base (i.e. two decimal digits per byte or something) or uses fractions (ratios) instead of floating point numbers. That would solve your problem.



回答3:

Double values, when there is not enough precision to represent a value, rounds up or down to the closest one. In your implementation it is rounding up to 1.



回答4:

There are two conversions involved here. First, and in some ways most important, is the conversion of the literal .99999999999999999999L to long double. As others have said, this conversion rounds to the nearest representable value, which seems to be 1.0L. The second conversion is from the long double value that resulted from the first conversion to an integer value. That conversion rounds toward 0, which is why a quick examination suggests that the value of y should be 0. But because the first conversion produced 1 and not a value slightly less than 1, this conversion also produces 1.