floating point equality in Python and in general

2019-01-14 03:22发布

I have a piece of code that behaves differently depending on whether I go through a dictionary to get conversion factors or whether I use them directly.

The following piece of code will print 1.0 == 1.0 -> False

But if you replace factors[units_from] with 10.0 and factors[units_to ] with 1.0 / 2.54 it will print 1.0 == 1.0 -> True

#!/usr/bin/env python

base = 'cm'
factors = {
    'cm'        : 1.0,
    'mm'        : 10.0,
    'm'         : 0.01,
    'km'        : 1.0e-5,
    'in'        : 1.0 / 2.54,
    'ft'        : 1.0 / 2.54 / 12.0,
    'yd'        : 1.0 / 2.54 / 12.0 / 3.0,
    'mile'      : 1.0 / 2.54 / 12.0 / 5280,
    'lightyear' : 1.0 / 2.54 / 12.0 / 5280 / 5.87849981e12,
}

# convert 25.4 mm to inches
val = 25.4
units_from = 'mm'
units_to = 'in'

base_value = val / factors[units_from]
ret = base_value * factors[units_to  ]
print ret, '==', 1.0, '->', ret == 1.0

Let me first say that I am pretty sure what is going on here. I have seen it before in C, just never in Python but since Python in implemented in C we're seeing it.

I know that floating point numbers will change values going from a CPU register to cache and back. I know that comparing what should be two equal variables will return false if one of them was paged out while the other stayed resident in a register.

Questions

  • What is the best way to avoid problems like this?... In Python or in general.
  • Am I doing something completely wrong?

Side Note

This is obviously part of a stripped down example but what I'm trying to do is come with with classes of length, volume, etc that can compare against other objects of the same class but with different units.

Rhetorical Questions

  • If this is a potentially dangerous problem since it makes programs behave in an undetermanistic matter, should compilers warn or error when they detect that you're checking equality of floats
  • Should compilers support an option to replace all float equality checks with a 'close enough' function?
  • Do compilers already do this and I just can't find the information.

8条回答
Melony?
2楼-- · 2019-01-14 03:26

Let me first answer by saying that you should read David Goldberg's classic What Every Computer Scientist Should Know About Floating-Point Arithmetic.

As some other commentators already said, the discrepancy you notice is intrinsically due to the floating-point model and has nothing to do with registers, cache or memory.

According to the floating-point model, 2.54 is actually represented as

>>> 2859785763380265 * 2 ** -50
2.54

This representation, however is not exact:

>>> from fractions import Fraction
>>> float(Fraction(2859785763380265, 2 ** 50) - Fraction(254, 100))
3.552713678800501e-17

Now, the expression you are evaluating actually is:

>>> 25.4 / 10 * (1/2.54)
0.99999999999999989

The problem lies in the 1/2.54:

>>> Fraction.from_float(1/2.54)
Fraction(1773070719437203, 4503599627370496)

But what you would expect is

>>> 1/Fraction.from_float(2.54)
Fraction(1125899906842624, 2859785763380265)

To answer your questions:

  • It is a difficult problem, but is clearly deterministic, nothing mysterious there.
  • You cannot automatically replace equality with a close-enough comparison. The latter requires that you specify a tolerance, which depends on the problem at hand, i.e. on what kind of precision you are expecting from your results. There are also plenty of situations where you really want equality not a close-enough comparison.
查看更多
迷人小祖宗
3楼-- · 2019-01-14 03:28

The difference is that if you replace factors[units_to ] with 1.0 / 2.54, you're doing:

(base_value * 1.0) / 2.54

With the dictionary, you're doing:

base_value * (1.0 / 2.54)

The order of rounding matters. This is easier to see if you do:

>>> print (((25.4 / 10.0) * 1.0) / 2.54).__repr__()
1.0
>>> print ((25.4 / 10.0) * (1.0 / 2.54)).__repr__()
0.99999999999999989

Note that there is no non-deterministic or undefined behavior. There is a standard, IEEE-754, which implementations must conform to (not to claim they always do).

I don't think there should be an automatic close enough replacement. That is often an effective way to deal with the problem, but it should be up to the programmer to decide if and how to use it.

Finally, there are of course options for arbitrary-precision arithmetic, including python-gmp and decimal. Think whether you actually need these, because they do have a significant performance impact.

There is no issue with moving between regular registers and cache. You may be thinking of the x86's 80-bit extended precision.

查看更多
forever°为你锁心
4楼-- · 2019-01-14 03:35

In order to compare floats in general compare the absolute value of the difference of the floats to a chosen delta that is small enough to fit your needs.

Rhetorical Questions

  • This **IS a dangerous problem ** as it might hide errors or generate infinite loops if such a comparison is used as stop criteria.
  • Modern C/C++ compilers warn for comparison of floats for equality
  • All static code checkers I know will output errors for the languages I use

I suppose it is the same for python, as the delta to use for comparison may vary it must be up to the implementer to choose it. What means that no good default transformation can be provided fully automatically.

查看更多
干净又极端
5楼-- · 2019-01-14 03:36

if I run this

x = 0.3+0.3+0.3
if (x != 0.9): print "not equal"
if (x == 0.9): print "equal"

it prints "not equal" which is wrong but as

x-0.9

gives the float error as -1.11022302e-16 i just do something like this:

if (x - 0.9 < 10**-8): print "equal (almost)"

otherwise you can convert both to strings, I guess:

if (str(x) == str(0.9)): print "equal (strings)"
查看更多
狗以群分
6楼-- · 2019-01-14 03:40

Also interesting is how many floating points there are between equal numbers when one of them is written out as a string and read back in.

That is arguably a Python bug. That number was written out with just twelve digits. Two uniquely identify a 64-bit double (Python's float type) you need seventeen digits of mantissa. If Python printed out its numbers with 17 digits of precision then you would be guaranteed to get back exactly the same value.

The precision issue is discussed at: http://randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero-to-100-digits-2/

The focus is on 32-bit float (which requires nine digits of mantissa to uniquely identify each number) but it briefly mentions double, and the fact that it requires 17 digits of mantissa.

查看更多
做自己的国王
7楼-- · 2019-01-14 03:43

As has been shown comparing two floats (or doubles etc) can be problematic. Generally, instead of comparing for exact equality they should be checked against an error bound. If they are within the error bound, they are considered equal.

That is much easier said than done. The nature of floating point make a fixed error bound worthless. A small error bound (like 2*float_epsilon) works well when the values are near 0.0, but will fail if the value are near 1000. An error bound for values as large as 1,000,000.0 will be far too lax for values near 0.0.

The best solution is know the domain of your math and pick an approriate err bound on a case by case basis.

When this is impractical or you're being lazy, Units in the Last Place (ULPs) is a very novel and robust solution. The full details are quite involved, you can read more here.

The basic idea is this, a floating point number has two pieces, mantissa and exponent. Generally rounding errors only change the mantissa by a few steps. When the value is near 0.0 those steps are exactly float_epsilon. When the floating point value is nearer to 1,000,000, the steps will be nearly as large as 1.

Google test uses ULP to compare floating point numbers. They chose a default of 4 ULPs for a two floating point numbers to be compared equal. You could also use their code as reference to build your own ULP style float comparator.

查看更多
登录 后发表回答