How do you print the EXACT value of a floating poi

2019-01-06 13:08发布

First of all, this is not a floating point newbie question. I know results of floating point arithmetic (not to mention transcendental functions) usually cannot be represented exactly, and that most terminating decimals cannot be represented exactly as binary floating point numbers.

That said, each possible floating point value corresponds exactly to a diadic rational (a rational number p/q where q is a power of 2), which in turn has an exact decimal representation.

My question is: How do you find this exact decimal representation efficiently? sprintf and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation. I know one algorithm I've used, but it's very slow, O(e^2) where e is the exponent. Here's an outline:

  1. Convert the mantissa to a decimal integer. You can either do this by pulling apart the bits to read the mantissa directly, or you can write a messy floating point loop that first multiplies the value by a power of two to put it in the range 1<=x<10, then pulls off a digit at a time by casting to int, subtracting, and multiplying by 10.
  2. Apply the exponent by repeatedly multiplying or dividing by 2. This is an operation on the string of decimal digits you generated. Every ~3 multiplications will add an extra digit to the left. Every single dividion will add an extra digit to the right.

Is this really the best possible? I doubt it, but I'm not a floating-point expert and I can't find a way to do the base-10 computations on the floating point representation of the number without running into a possibility of inexact results (multiplying or dividing by anything but a power of 2 is a lossy operation on floating point numbers unless you know you have free bits to work with).

10条回答
霸刀☆藐视天下
2楼-- · 2019-01-06 13:47

There's been a lot of work on printing floating-point numbers. The gold standard is to print out a decimal equivalent of minimal length such that when the decimal equivalent is read back in, you get the same floating-point number that you started with, no matter what the rounding mode is during readback. You can read about the algorithm in the excellent paper by Burger and Dybvig.

查看更多
家丑人穷心不美
3楼-- · 2019-01-06 13:47

There are 3 ways

  1. printing numbers in bin or hex

    This is the most precise way. I prefer hex because it is more like base 10 for reading/feel like for example F.8h = 15.5 no precision loss here.

  2. printing in dec but only the relevant digits

    With this I mean only digits which can have 1 in your number represented as close as possible.

    num of integer digits are easy and precise (no precision loss):

    // n10 - base 10 integer digits
    // n2  - base 2 integer digits
    n10=log10(2^n2)
    n10=log2(2^n2)/log2(10)
    n10=n2/log2(10)
    n10=ceil(n2*0.30102999566398119521373889472449)
    // if fist digit is 0 and n10 > 1 then n10--
    

    num of fractional digits are more tricky and empirically I found this:

    // n10 - base 10 fract. digits
    // n2  - base 2 fract. digits >= 8
    n10=0; if (n02==8) n10=1;
    else if (n02==9) n10=2;
    else if (n02> 9)
        {
        n10=((n02-9)%10);
             if (n10>=6) n10=2;
        else if (n10>=1) n10=1;
        n10+=2+(((n02-9)/10)*3);
        }
    

    if you make a dependency table n02 <-> n10 then you see that constant 0.30102999566398119521373889472449 is still present, but at start from 8 bits because less cannot represent 0.1 with satisfactory precision (0.85 - 1.15). because of negative exponents of base 2 the dependency is not linear, instead it patterns. This code works for small bit count (<=52) but at larger bit counts there can be error because used pattern do not fit log10(2) or 1/log2(10) exactly.

    for larger bit counts I use this:

    n10=7.810+(9.6366363636363636363636*((n02>>5)-1.0));
    

    but that formula is 32bit aligned !!! and also bigger bit count ads error to it

    P.S. further analysis of binary representation of decadic numbers

    0.1
    0.01
    0.001
    0.0001
    ...
    

    may reveal the exact pattern repetition which would lead to exact number of relevant digits for any bit count.

    for clarity:

    8 bin digits -> 1 dec digits
    9 bin digits -> 2 dec digits
    10 bin digits -> 3 dec digits
    11 bin digits -> 3 dec digits
    12 bin digits -> 3 dec digits
    13 bin digits -> 3 dec digits
    14 bin digits -> 3 dec digits
    15 bin digits -> 4 dec digits
    16 bin digits -> 4 dec digits
    17 bin digits -> 4 dec digits
    18 bin digits -> 4 dec digits
    19 bin digits -> 5 dec digits
    20 bin digits -> 6 dec digits
    21 bin digits -> 6 dec digits
    22 bin digits -> 6 dec digits
    23 bin digits -> 6 dec digits
    24 bin digits -> 6 dec digits
    25 bin digits -> 7 dec digits
    26 bin digits -> 7 dec digits
    27 bin digits -> 7 dec digits
    28 bin digits -> 7 dec digits
    29 bin digits -> 8 dec digits
    30 bin digits -> 9 dec digits
    31 bin digits -> 9 dec digits
    32 bin digits -> 9 dec digits
    33 bin digits -> 9 dec digits
    34 bin digits -> 9 dec digits
    35 bin digits -> 10 dec digits
    36 bin digits -> 10 dec digits
    37 bin digits -> 10 dec digits
    38 bin digits -> 10 dec digits
    39 bin digits -> 11 dec digits
    40 bin digits -> 12 dec digits
    41 bin digits -> 12 dec digits
    42 bin digits -> 12 dec digits
    43 bin digits -> 12 dec digits
    44 bin digits -> 12 dec digits
    45 bin digits -> 13 dec digits
    46 bin digits -> 13 dec digits
    47 bin digits -> 13 dec digits
    48 bin digits -> 13 dec digits
    49 bin digits -> 14 dec digits
    50 bin digits -> 15 dec digits
    51 bin digits -> 15 dec digits
    52 bin digits -> 15 dec digits
    53 bin digits -> 15 dec digits
    54 bin digits -> 15 dec digits
    55 bin digits -> 16 dec digits
    56 bin digits -> 16 dec digits
    57 bin digits -> 16 dec digits
    58 bin digits -> 16 dec digits
    59 bin digits -> 17 dec digits
    60 bin digits -> 18 dec digits
    61 bin digits -> 18 dec digits
    62 bin digits -> 18 dec digits
    63 bin digits -> 18 dec digits
    64 bin digits -> 18 dec digits
    

    And at last do not forget to round the cut off digits !!! That means if digit after the last relevant digit is >=5 in dec than last relevant digit should be +1 ... and if it is already 9 than you must go to previous digit and so on...

  3. print exact value

    To print exact value of fractional binary number just print fractional n digits where n is the number of fractional bits because the value represented is sum of this values so the number of fractional decimals cannot be bigger than the num of fractional digits of LSB. Stuff above (bullet #2) is relevant for storing decimal number to float (or printing just relevant decimals).

    negative powers of two exact values...

    2^- 1 = 0.5
    2^- 2 = 0.25
    2^- 3 = 0.125
    2^- 4 = 0.0625
    2^- 5 = 0.03125
    2^- 6 = 0.015625
    2^- 7 = 0.0078125
    2^- 8 = 0.00390625
    2^- 9 = 0.001953125
    2^-10 = 0.0009765625
    2^-11 = 0.00048828125
    2^-12 = 0.000244140625
    2^-13 = 0.0001220703125
    2^-14 = 0.00006103515625
    2^-15 = 0.000030517578125
    2^-16 = 0.0000152587890625
    2^-17 = 0.00000762939453125
    2^-18 = 0.000003814697265625
    2^-19 = 0.0000019073486328125
    2^-20 = 0.00000095367431640625
    

    now negative powers of 10 printed by exact value style for 64 bit doubles:

    10^+ -1 = 0.1000000000000000055511151231257827021181583404541015625
            = 0.0001100110011001100110011001100110011001100110011001101b
    10^+ -2 = 0.01000000000000000020816681711721685132943093776702880859375
            = 0.00000010100011110101110000101000111101011100001010001111011b
    10^+ -3 = 0.001000000000000000020816681711721685132943093776702880859375
            = 0.000000000100000110001001001101110100101111000110101001111111b
    10^+ -4 = 0.000100000000000000004792173602385929598312941379845142364501953125
            = 0.000000000000011010001101101110001011101011000111000100001100101101b
    10^+ -5 = 0.000010000000000000000818030539140313095458623138256371021270751953125
            = 0.000000000000000010100111110001011010110001000111000110110100011110001b
    10^+ -6 = 0.000000999999999999999954748111825886258685613938723690807819366455078125
            = 0.000000000000000000010000110001101111011110100000101101011110110110001101b
    10^+ -7 = 0.0000000999999999999999954748111825886258685613938723690807819366455078125
            = 0.0000000000000000000000011010110101111111001010011010101111001010111101001b
    10^+ -8 = 0.000000010000000000000000209225608301284726753266340892878361046314239501953125
            = 0.000000000000000000000000001010101111001100011101110001000110000100011000011101b
    10^+ -9 = 0.0000000010000000000000000622815914577798564188970686927859787829220294952392578125
            = 0.0000000000000000000000000000010001001011100000101111101000001001101101011010010101b
    10^+-10 = 0.00000000010000000000000000364321973154977415791655470655996396089904010295867919921875
            = 0.00000000000000000000000000000000011011011111001101111111011001110101111011110110111011b
    10^+-11 = 0.00000000000999999999999999939496969281939810930172340963650867706746794283390045166015625
            = 0.00000000000000000000000000000000000010101111111010111111111100001011110010110010010010101b
    10^+-12 = 0.00000000000099999999999999997988664762925561536725284350612952266601496376097202301025390625
            = 0.00000000000000000000000000000000000000010001100101111001100110000001001011011110101000010001b
    10^+-13 = 0.00000000000010000000000000000303737455634003709136034716842278413651001756079494953155517578125
            = 0.00000000000000000000000000000000000000000001110000100101110000100110100001001001011101101000001b
    10^+-14 = 0.000000000000009999999999999999988193093545598986971343290729163921781719182035885751247406005859375
            = 0.000000000000000000000000000000000000000000000010110100001001001101110000110101000010010101110011011b
    10^+-15 = 0.00000000000000100000000000000007770539987666107923830718560119501514549256171449087560176849365234375
            = 0.00000000000000000000000000000000000000000000000001001000000011101011111001111011100111010101100001011b
    10^+-16 = 0.00000000000000009999999999999999790977867240346035618411149408467364363417573258630000054836273193359375
            = 0.00000000000000000000000000000000000000000000000000000111001101001010110010100101111101100010001001101111b
    10^+-17 = 0.0000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
            = 0.0000000000000000000000000000000000000000000000000000000010111000011101111010101000110010001101101010010010111b
    10^+-18 = 0.00000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
            = 0.00000000000000000000000000000000000000000000000000000000000100100111001001011101110100011101001001000011101011b
    10^+-19 = 0.000000000000000000099999999999999997524592683526013185572915905567688179926555402943222361500374972820281982421875
            = 0.000000000000000000000000000000000000000000000000000000000000000111011000001111001001010011111011011011010010101011b
    10^+-20 = 0.00000000000000000000999999999999999945153271454209571651729503702787392447107715776066783064379706047475337982177734375
            = 0.00000000000000000000000000000000000000000000000000000000000000000010111100111001010000100001100100100100100001000100011b
    

    now negative powers of 10 printed by relevant decimal digits only style (i am more used to this) for 64bit doubles:

    10^+ -1 = 0.1
    10^+ -2 = 0.01
    10^+ -3 = 0.001
    10^+ -4 = 0.0001
    10^+ -5 = 0.00001
    10^+ -6 = 0.000001
    10^+ -7 = 0.0000001
    10^+ -8 = 0.00000001
    10^+ -9 = 0.000000001
    10^+-10 = 0.0000000001
    10^+-11 = 0.00000000001
    10^+-12 = 0.000000000001
    10^+-13 = 0.0000000000001
    10^+-14 = 0.00000000000001
    10^+-15 = 0.000000000000001
    10^+-16 = 0.0000000000000001
    10^+-17 = 0.00000000000000001
    10^+-18 = 0.000000000000000001
    10^+-19 = 0.0000000000000000001
    10^+-20 = 0.00000000000000000001
    

    hope it helps :)

查看更多
老娘就宠你
4楼-- · 2019-01-06 13:48

You don't. The closest you can come to that is dumping the bytes.

查看更多
\"骚年 ilove
5楼-- · 2019-01-06 13:49

If you want more exact results, why not use fixed point math instead? Conversions are quick. Error is known and can be worked around. Not an exact answer to your question, but a different idea for you.

查看更多
啃猪蹄的小仙女
6楼-- · 2019-01-06 13:49

Off the top of my head, why not break the exponent down into a sum of binary exponents first, then all your operations are loss-less.

I.e.

10^2 = 2^6 + 2^5 + 2^2

Then sum:

mantissa<<6 + mantissa<<5 + mantissa<<2

I'm thinking that breaking it down would be on the O(n) on the the number of digits, the shifting is O(1), and the summing is O(n) digits...

You would have to have an integer class big enough to store the results, of course...

Let me know - I'm curious about this, it really made me think. :-)

查看更多
相关推荐>>
7楼-- · 2019-01-06 13:51

Well being no floating point expert myself, I'd defer to using a well tested open source library.

The GNU MPFR is a good one.

The MPFR library is a C library for multiple-precision floating-point computations with correct rounding. The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics.

查看更多
登录 后发表回答