The IEE754 (64 bits) floating point is supposed to correctly represent 15 significant digit although the internal representation has 17 ditigs. Is there a way to force the 16th and 17th digits to zero ??
Ref: http://msdn.microsoft.com/en-us/library/system.double(VS.80).aspx : . .
Remember that a floating-point number can only approximate a decimal number, and that the precision of a floating-point number determines how accurately that number approximates a decimal number. By default, a Double value contains 15 decimal digits of precision, although a maximum of 17 digits is maintained internally. The precision of a floating-point number has several consequences: . .
Example nos:
d1 = 97842111437.390091
d2 = 97842111437.390076
d1 and d2 differ in 16th and 17th decimal places that are not supposed to be significant. Looking for ways to force them to zero. ie
d1 = 97842111437.390000
d2 = 97842111437.390000
Generally speaking, people only care about something like this ("I only want the first x digits") when displaying the number. That's relatively easy with
stringstream
s orsprintf
.If you're concerned about comparing numbers with
==
; you really can't do that with floating point numbers. Instead you want to see if the numbers are close enough (say, within anepsilon()
of each other).Playing with the bits of the number directly isn't a great idea.
No. Counter-example: the two closest floating-point numbers to a rational
(which has 15 decimal digits) are
In other words, there is not floating-point number that starts with
1.1111111111111800
.This question is a little malformed. The hardware stores the numbers in binary, not decimal. So in the general case you can't do precise math in base 10. Some decimal numbers (0.1 is one of them!) do not even have a non-repeating representation in binary. If you have precision requirements like this, where you care about the number being of known precision to exactly 15 decimal digits, you will need to pick another representation for your numbers.
You should be able to directly modify the bits in your number by creating a union with a field for the floating point number and an integral type of the same size. Then you can access the bits you want and set them however you want. Here is in example where I whack the sign bit; you can choose any field you want, of course.
No, but I wonder if this is relevant to any of your issues (GCC specific):
GCC Documentation