I have a question regarding precision of calculations - it is more of a mathematical theory behind programming.
I have a given float number X
and the rounding of that number, which is accurate to 10^(-n)
decimal place: X'
. Now, I would like to know, if after calculating exponent function: y=2^(x)
the difference between my number and the rounded number would stay on the same level of precision. I mean:
|2^(X)-2^(X')|
is at the level of 10^(-n-1)
Exponentiation magnifies relative error and, by extension, ulp error. Consider this illustrative example:
float x = 0x1.fffffep6;
printf ("x=%a %15.8e exp2(x)=%a %15.8e\n", x, x, exp2f (x), exp2f(x));
x = nextafterf (x, 0.0f);
printf ("x=%a %15.8e exp2(x)=%a %15.8e\n", x, x, exp2f (x), exp2f(x));
This will print something like
x=0x1.fffffep+6 1.27999992e+02 exp2(x)=0x1.ffff4ep+127 3.40280562e+38
x=0x1.fffffcp+6 1.27999985e+02 exp2(x)=0x1.fffe9ep+127 3.40278777e+38
The maximum ulp error in the result will be on the same order of magnitude as 2exponent bits of the floating format used. In this particular example, there are 8 exponent bits in an IEEE-754 float
, and a 1 ulp difference in the input translates into a 176 ulp difference in the result. The relative difference in the arguments is about 5.5e-8, while the relative difference in the results is about 5.3e-6.
A simplified, intuitive, way of thinking about this magnification is that out of the finite number of bits in the significand / mantissa of the floating-point argument, some only contribute to the magnitude, thus exponent bits, of the result (in the example, these would be the bits representing the integral portion of 127), while the remaining bits contribute to the significand / mantissa bits of the result.
If you look at it mathematically, if the original argument x = n*(1+ε), then ex = en*(1+ε) = en * en*ε ≈ en * (1+n*ε). So if n ≈ 128, ε ≈ 1e-7, then expected maximum relative error is around 1.28e-5.