I'm building a program to to convert double values in to scientific value format(mantissa, exponent). Then I noticed the below
369.7900000000000 -> 3.6978999999999997428
68600000 -> 6.8599999999999994316
I noticed the same pattern for several other values also. The maximum fractional error is
0.000 000 000 000 001 = 1*e-15
I know the inaccuracy in representing double values in a computer. Can this be concluded that the maximum fractional error we would get is 1*e-15
? What is significant about this?
I went through most of the questions on floating point precision problem in stack overflow, but I didnt see any about the maximum fractional error in 64 bits.
To be clear on the computation I do, I have mentioned my code snippet as well
double norm = 68600000;
if (norm)
{
while (norm >= 10.0)
{
norm /= 10.0;
exp++;
}
while (norm < 1.0)
{
norm *= 10.0;
exp--;
}
}
Now I get
norm = 6.8599999999999994316;
exp = 7
The number you are getting is related to the machine epsilon for the
double
data type.A
double
is 64 bits long, with 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa fraction. Adouble
's value is given byWith only 52 bits for the mantissa, any
double
value below2^-52
will be completely lost when added to1.0
due to its small significance. In binary,1.0 + 2^-52
would beObviously anything lower would not change the value of
1.0
. You can verify for yourself that1.0 + 2^-53 == 1.0
in a program.This number
2^-52 = 2.22e-16
is called the machine epsilon and is an upper bound on the relative error that occurs during one floating point arithmetic due to round-off error withdouble
values.Similarly,
float
has 23 bits in its mantissa and so its machine epsilon is2^-23 = 1.19e-7
.The reason you are getting
1e-15
may be because errors accumulate as you perform many arithmetic operations, but I can't say because I don't know the exact calculations you are doing.EDIT: I've looked into the relative error for your problem with 68600000.
First off, you may be interested to know that round-off error can change the result of your computation if you break it into steps:
In the first line, the closest
double
to 68.6 is lower than the actual value, but in the third line we see the closestdouble
to 6.86 is greater.If we look at the abosolute error
e_abs = abs(v-v_approx)
of your program, we see that it isHowever, the relative error
e_abs = abs( (v-v_approx)/ v) = abs(e_abs/v)
would beWhich is indeed below our machine epsilon of
2.22e-16
.This is a famous paper you can read if you want to know all the details about floating point arithmetic.