I am trying to understand floating point arithmetic better and have seen a few links to 'What Every Computer Scientist Should Know About Floating Point Arithmetic'.
I still don't understand how a number like 0.1
or 0.5
is stored in floats and as decimals.
Can someone please explain how it is laid out is memory?
I know about the float being two parts (i.e., a number to the power of something).
See the Wikipedia entry and the IEEE group, first.
Basically, there's a sign, a number, and an exponent. A number in one base cannot be represented finitely in another base if the source base has factors not present in the destination base. For instance, 1/3 cannot be represented as a finite decimal number, but is trivial to represent as a ternary (base-3) number: (0.1)3.
So 0.5 has a finite binary representation, (0.1)2, that is, 2-1, but 0.1 has a repeating representation because 2 and 10 have a factor (5) not in common.
I've always pointed people towards Harald Schmidt's online converter, along with the Wikipedia IEEE754-1985 article with its nice pictures.
For those two specific values, you get (for 0.1):
The sign is positive, that's pretty easy.
The exponent is
64+32+16+8+2+1 = 123 - 127 bias = -4
, so the multiplier is2-4
or1/16
.The mantissa is chunky. It consists of
1
(the implicit base) plus (for all those bits with each being worth1/(2n)
asn
starts at1
and increases to the right),{1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}
.When you add all these up, you get
1.60000002384185791015625
.When you multiply that by the multiplier, you get
0.100000001490116119384765625
, which is why they say you cannot represent0.1
exactly as an IEEE754 float, and provides so much opportunity on SO for people answering"why doesn't 0.1 + 0.1 + 0.1 == 0.3?"
-type questions :-)The 0.5 example is substantially easier. It's represented as:
which means it's the implicit base,
1
, plus no other additives (all the mantissa bits are zero).The sign is again positive. The exponent is
64+32+16+8+4+2 = 126 - 127 bias = -1
. Hence the multiplier is2-1
which is1/2
or0.5
.So the final value is
1
multiplied by0.5
, or0.5
. Voila!I've sometimes found it easier to think of it in terms of decimal.
The number 1.345 is equivalent to
or:
Similarly, the IEEE754 representation for decimal
0.8125
is:With the implicit base of 1, that's equivalent to the binary:
or:
which becomes:
and then becomes: