I am having some trouble with packing and unpacking of binary floats in python when doing a binary file write. Here is what I have done:
import struct
f = open('file.bin', 'wb')
value = 1.23456
data = struct.pack('f',value)
f.write(data)
f.close()
f = open('file.bin', 'rb')
print struct.unpack('f',f.read(4))
f.close()
The result I get is the following:
(1.2345600128173828,)
What is going on with the extra digits? Is this a rounding error? How does this work?
On most platforms, Python floats are what C would call a double
, but you wrote your data out as float
instead, which has half the precision.
If you were to use double
, you'd have less precision loss:
>>> data = struct.pack('d',value)
>>> struct.unpack('d',data)
(1.23456,)
>>> data = struct.pack('f',value)
>>> struct.unpack('f',data)
(1.2345600128173828,)
The float
struct format offers only single precision (24 bits for the significant precision).
It's a decimal to binary problem.
You know how some fractions in decimal are repeating? For instance, 1/3 is 0.3333333-> forever. 1/7 is 0.142857142857[142857]-> forever.
So here's the kicker: repeating fractions are those with a denominator that has a factor that is not a factor of 10 -- eg not a multiple of 2 and/or 5.
- 1/2 divides evenly
- 1/3 repeats
- 1/4 divides evenly
- 1/5 divides evenly
- 1/6 repeats
- 1/7 repeats
- 1/8 divides evenly
- 1/9 repeats
- 1/10 divides evenly
- 1/11 repeats
- and so forth
So now how does that work in binary? Well, it kinda sucks, because the only factor that divides evenly is 2. All other prime numbers besides 2 will have repeating decimals that repeat forever -- and that includes tenths, hundredths, etc, which all have a factor of 5 in the denominator. 1.2345 is 12345/10000, which has factors 2 and 5 in the denominator, and that 5 means you have a repeating decimal in binary that repeats forever.
But you can't repeat forever. Which means that you will have to round off the decimal for it to fit in the binary digits encoding your float.
When you convert back to decimal, the rounding error is revealed.
The upshot for coding is: calculate divisions as late as possible to keep these errors from accumulating with each calculation.