While investigating floating-point exception status flags, I came across the curious case of a status flag FE_UNDERFLOW
set when not expected.
This is similar to When does underflow occur? yet goes into a corner case that may be a C specification issue or FP hardware defect.
// pseudo code
// s bias_expo implied "mantissa"
w = smallest_normal; // 0 000...001 (1) 000...000
x = w * 2; // 0 000...010 (1) 000...000
y = next_smaller(x); // 0 000...001 (1) 111...111
round_mode(FE_TONEAREST);
clear_status_flags();
z = y/2; // 0 000...001 (1) 000...000
FE_UNDERFLOW is set!?
I did not expect FE_UNDERFLOW
to be set as z
above is normal, not a sub-normal.
I expect FE_UNDERFLOW
when the result of an earlier floating-point operation was subnormal with a loss of precision. In this case there is a loss of precision.
I tried this with my float
and long double
and had the same result.
After much investigation, I noted that __STDC_IEC_559__
is not defined.
Questions
If
__STDC_IEC_559__
is defined, what is the correct state of underflow in this case?With lack of a defined
__STDC_IEC_559__
am I stuck with "Implementations that do not define__STDC_IEC_559__
are not required to conform to these specifications." C11 or is there some C specification that indicates this result is incorrect?Since this is certainly a result of my hardware (processor), your result may differ and that would be interesting to know.
Follows is some test code that demos this. At first I suspected it is because FLT_EVAL_METHOD = 2
on my machine, yet then I tried similar code with long double
and the same result.
// These 2 includes missing in original post, yet in my true test code
#include <float.h>
#include <math.h>
#include <fenv.h>
#include <stdio.h>
#include <stdint.h>
#define N (sizeof excepts/sizeof excepts[0])
void Report_IEC_FP_exception_status_flags(const char *s) {
printf("%s", s);
int excepts[] = { //
FE_DIVBYZERO, FE_INEXACT, FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW, };
const char *excepts_str[N] = { //
"FE_DIVBYZERO", "FE_INEXACT", "FE_INVALID", "FE_OVERFLOW", "FE_UNDERFLOW", };
int excepts_val[N];
for (unsigned i = 0; i < N; i++) {
excepts_val[i] = fetestexcept(excepts[i]);
}
for (unsigned i = 0; i < N; i++) {
if (excepts_val[i]) printf(" %s", excepts_str[i]);
}
printf("\n");
fflush(stdout);
}
#undef N
void test2(float f, int round_mode, const char *name) {
union {
float f;
uint32_t u32;
} x = { .f = f};
printf("x:%+17a %08lX normal:%c round_mode:%d %s\n", //
f, (unsigned long) x.u32, isnormal(f) ? 'Y' : 'n', round_mode, name);
if (feclearexcept(FE_ALL_EXCEPT)) puts("Clear Fail");
Report_IEC_FP_exception_status_flags("Before:");
f /= 2;
Report_IEC_FP_exception_status_flags("After :");
printf("y:%+17a %08lX normal:%c\n\n",
f,(unsigned long) x.u32, isnormal(f) ? 'Y' : 'n');
}
Driver
// In same file as above
int main(void) {
#ifdef __STDC_IEC_559__
printf("__STDC_IEC_559__ = %d\n", __STDC_IEC_559__);
#else
printf("__STDC_IEC_559__ = not define\n");
#endif
float f = FLT_MIN;
printf("FLT_EVAL_METHOD = %d\n", FLT_EVAL_METHOD);
printf("FLT_MIN:%+17a\n", f);
f *= 2.0f;
test2(f, FE_TONEAREST, "FE_TONEAREST");
f = nextafterf(f, 0);
test2(f, FE_TONEAREST, "FE_TONEAREST"); // *** problem? ***
f = nextafterf(f, 0);
test2(f, FE_TONEAREST, "FE_TONEAREST");
}
Output
__STDC_IEC_559__ = not define
FLT_EVAL_METHOD = 2
FLT_MIN: +0x1p-126
x: +0x1p-125 01000000 normal:Y round_mode:0 FE_TONEAREST
Before:
After :
y: +0x1p-126 01000000 normal:Y
x: +0x1.fffffep-126 00FFFFFF normal:Y round_mode:0 FE_TONEAREST
Before:
After : FE_INEXACT FE_UNDERFLOW *** Why FE_UNDERFLOW? ***
y: +0x1p-126 00FFFFFF normal:Y *** Result is normal ***
x: +0x1.fffffcp-126 00FFFFFE normal:Y round_mode:0 FE_TONEAREST
Before:
After :
y: +0x1.fffffcp-127 00FFFFFE normal:n
Ref
Implementation notes:
GNU C11 (GCC) version 6.4.0 (i686-pc-cygwin) compiled by GNU C version 6.4.0, GMP version 6.1.2, MPFR version 3.1.5-p10, MPC version 1.0.3, isl version 0.14 or 0.13
glibc 2.26 released.
Intel Xeon W3530, 64-bit OS (Windows 7)
[Minor Update] The illustrative print of the quotient as an 32-bit hex number should have used y.u32
. This does not change the function under test
// printf("y:%+17a %08lX normal:%c\n\n",
// f,(unsigned long) x.u32, isnormal(f) ? 'Y' : 'n');
union {
float f;
uint32_t u32;
} y = { .f = f};
printf("y:%+17a %08lX normal:%c\n\n",
f,(unsigned long) y.u32, isnormal(f) ? 'Y' : 'n');
// ^^^^^
Although not intended as a self answer, input from various commenters @John Bollinger, @nwellnhof and further research leads to:
Yes, in narrow situations. See below.
"Underflow" occurs when:
The
z = y/2;
above is 1) inexact (due to rounding) and 2) maybe considered "too small".Math
The
z = y/2;
can be thought of going through 2 stages: dividing and rounding. The mathematical quotient, with unlimited precision, is less than the smallest normal numberFLT_MIN
and more than the greatest sub-normal numbernextafterf(FLT_MIN,0)
. Depending on rounding mode, the final answer is either one of those two. WithFE_TONEAREST
,z
is assignedFLT_MIN
, a normal number.Spec
The C spec below and to IEC 60559 indicate
and
(My emphasis)
Q & A
The underflow flag may be set or left alone in this case. Either complies. There is a preference though, for not setting the underflow flag.
The setting of the underflow flag result in not incorrect. The FP spec allows this behavior. It also allows to not set the underflow flag.
On another platform, where
__STDC_IEC_559__ = not define
andFLT_EVAL_METHOD = 0
, theFE_INEXACT FE_UNDERFLOW
flags were both set, just like in the above tst case. The issue applies tofloat, double, long double
.If the mathematical answer lies in the grey "Between" zone below, it will get rounding down to a sub-normal
double
or up to the normaldouble
DBL_MIN
depending on its value and rounding mode. If rounded down, thenFE_UNDERFLOW
is certainly set. If rounded up, thenFE_UNDERFLOW
may be set or not depending on when determination of the 'tiny' condition is applied.