Range of integers that can be expressed precisely

2019-04-29 10:08发布

问题:

This question already has an answer here:

  • What's the first double that deviates from its corresponding long by delta? 4 answers
  • Which is the first integer that an IEEE 754 float is incapable of representing exactly? 2 answers

What is the exact range of (contiguous) integers that can be expressed as a double (resp. float?) The reason I ask is because I am curious for questions such as this one when a loss of accuracy will occur.

That is

  1. What is the least positive integer m such that m+1 cannot be precisely expressed as a double (resp. float)?
  2. What is the greatest negative integer -n such that -n-1 cannot be precisely expressed as a double (resp. float)? (May be the same as the above).

This means that every integer between -n and m has an exact floating-point representation. I'm basically looking for the range [-n, m] for both floats and doubles.

Let's limit the scope to the standard IEEE 754 32-bit and 64-bit floating point representations. I know that the float has 24 bits of precision and the double has 53 bits (both with a hidden leading bit), but due to the intricacies of the floating point representation I'm looking for an authoritative answer for this. Please don't wave your hands!

(Ideal answer would prove that all the integers from 0 to m are expressible, and that m+1 is not.)

回答1:

Since you're asking about IEEE floating-point types, the language does not matter.

#include <iostream>
using namespace std;

int main(){

    float f0 = 16777215.; // 2^24 - 1
    float f1 = 16777216.; // 2^24
    float f2 = 16777217.; // 2^24 + 1

    cout << (f0 == f1) << endl;
    cout << (f1 == f2) << endl;

    double d0 = 9007199254740991.; // 2^53 - 1
    double d1 = 9007199254740992.; // 2^53
    double d2 = 9007199254740993.; // 2^53 + 1

    cout << (d0 == d1) << endl;
    cout << (d1 == d2) << endl;
}

Output:

0
1
0
1

So the limit for float is 2^24. And the limit for double is 2^53. Negatives are the same since the only difference is the sign bit.