I'm trying to determine range of the various floating-point types. When I read this code:
#include <stdio.h>
main()
{
float fl, fltest, last;
double dbl, dbltest, dblast;
fl = 0.0;
fltest = 0.0;
while (fl == 0.0) {
last = fltest;
fltest = fltest + 1111e28;
fl = (fl + fltest) - fltest;
}
printf("Maximum range of float variable: %e\n", last);
dbl = 0.0;
dbltest = 0.0;
while (dbl == 0.0) {
dblast = dbltest;
dbltest = dbltest + 1111e297;
dbl = (dbl + dbltest) - dbltest;
}
printf("Maximum range of double variable: %e\n", dblast);
return 0;
}
I don't understand why author added 1111e28
at fltest
variable ?
OP: ... why author added
1111e28
atfltest
variable ?A: [Edit] For code to work using
float
,1111e28
, or1.111e31
this delta value needs careful selection. It should be big enough such that iffltest
wasFLT_MAX
, the sum offltest + delta
would overflow and becomefloat.infinity
. With round to nearest mode, this isFLT_MAX*FLT_EPSILON/4
. On my machine:delta
needs to be small enough so iff1test
is the 2nd largest number, adding delta, would not sum right up tofloat.infinity
and skipFLT_MAX
. This is 3x min_deltaSo
1.014120601e+31 <= 1111e28 < 3.042361441e+31
.@david.pfx Yes. 1111e28 is a cute number and it is in range.
Note: Complications occur when the math and its intermediate values, even though the variables are
float
may calcuate at higher precsison likedouble
. This is allowed in C and control byFLT_EVAL_METHOD
or very careful coding.1111e28
is a curious value that makes sense if the author all ready knew the general range ofFLT_MAX
.The below code is expected to loop many times (24946069 on one test platform). Hopefully, the value
fltest
eventually becomes "infinite". Thenf1
will becomes NaN as the difference of Infinity - Infinity. The the while loop ends as Nan != 0.0. @ecatmurThe looping, if done in small enough increments, will arrive at a precise answer. Prior knowledge of
FLT_MAX
andFLT_EPSILON
are needed to insure this.The problem with this is that C does not define the range
FLT_MAX
andDBL_MAX
other than they must be at least1E+37
. So if the maximum value was quite large, the increment value of 1111e28 or 1111e297 would have no effect. Example:dbltest = dbltest + 1111e297;
, fordbltest = 1e400
would certainly not increase 1e400 unlessdbltest
a hundred decimal digits of precision.If
DBL_MAX
was smaller than 1111e297, the method fails too. Note: On simple platforms in 2014, it is not surprising to finddouble
andfloat
to be the same 4-byte IEEE binary32 ) The first time though the loop,dbltest
becomes infinity and the loop stops, reporting "Maximum range of double variable: 0.000000e+00".There are many ways to efficiently derive the maximum float point value. A sample follows that uses a random initial value to help show its resilience to potential variant
FLT_MAX
.isinf()
is a new-ish C function. Simple enough to roll your own if needed.In re: @didierc comment
[Edit]
The precision of a
float
anddouble
is implied with "epsilon": "the difference between 1 and the least value greater than 1 that is representable in the given floating point type ...". The maximum values followPer @Pascal Cuoq comment. "... 1111e28 being chosen larger than FLT_MAX*FLT_EPSILON.", 1111e28 needs to be at least
FLT_MAX*FLT_EPSILON
to impact the loop's addition, yet small enough to precisely reach the number before infinity. Again, prior knowledge ofFLT_MAX
andFLT_EPSILON
are needed to make this determination. If these values are known ahead of time, then the code simple could have been:The largest value representable in a
float
is 3.40282e+38. The constant 1111e28 is chosen such that adding that constant to a number in the range of 10^38 still produces a different floating point value, so that the value offltest
will continue to increase as the function runs. It needs to be large enough that it will still be significant at the 10^38 range, and small enough that the result will be accurate.The loop terminates when
fltest
reaches+Inf
, as at that pointfl = (fl + fltest) - fltest
becomesNaN
, which is unequal to0.0
.last
contains a value which when added to1111e28
produces+Inf
and so is close to the upper limit offloat
.1111e28
is chosen to reach+Inf
reasonably quickly; it also needs to be large enough that when added to large values the loop continues to progress i.e. it is at least as large as the gap between the largest and second-largest non-infinitefloat
values.