Floating point equality

2019-03-10 13:40发布

问题:

It is common knowledge that one has to be careful when comparing floating point values. Usually, instead of using ==, we use some epsilon or ULP based equality testing.

However, I wonder, are there any cases, when using == is perfectly fine?

Look at this simple snippet, which cases are guaranteed to succeed?

void fn(float a, float b) {
    float l1 = a/b;
    float l2 = a/b;

    if (l1==l1) { }        // case a)
    if (l1==l2) { }        // case b)
    if (l1==a/b) { }       // case c)
    if (l1==5.0f/3.0f) { } // case d)
}

int main() {
    fn(5.0f, 3.0f);
}

Note: I've checked this and this, but they don't cover (all of) my cases.

Note2: It seems that I have to add some plus information, so answers can be useful in practice: I'd like to know:

  • what the C++ standard says
  • what happens, if a C++ implementation follows IEEE-754

This is the only relevant statement I found in the current draft standard:

The value representation of floating-point types is implementation-defined. [ Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note ]

So, does this mean, that even "case a)" is implementation defined? I mean, l1==l1 is definitely a floating-point operation. So, if an implementation is "inaccurate", then could l1==l1 be false?


I think this question is not a duplicate of Is floating-point == ever OK?. That question doesn't address any of the cases I'm asking. Same subject, different question. I'd like to have answers specifically to case a)-d), for which I cannot find answers in the duplicated question.

回答1:

However, I wonder, are there any cases, when using == is perfectly fine?

Sure there are. One category of examples are usages that involve no computation, e.g. setters that should only execute on changes:

void setRange(float min, float max)
{
    if(min == m_fMin && max == m_fMax)
        return;

    m_fMin = min;
    m_fMax = max;

    // Do something with min and/or max
    emit rangeChanged(min, max);
}

See also Is floating-point == ever OK? and Is floating-point == ever OK?.



回答2:

Contrived cases may "work". Practical cases may still fail. One additional issue is that often optimisation will cause small variations in the way the calculation is done so that symbolically the results should be equal but numerically they are different. The example above could, theoretically, fail in such a case. Some compilers offer an option to produce more consistent results at a cost to performance. I would advise "always" avoiding the equality of floating point numbers.

Equality of physical measurements, as well as digitally stored floats, is often meaningless. So if your comparing if floats are equal in your code you are probably doing something wrong. You usually want greater than or less that or within a tolerance. Often code can be rewritten so these types of issues are avoided.



回答3:

Only a) and b) are guaranteed to succeed in any sane implementation (see the legalese below for details), as they compare two values that have been derived in the same way and rounded to float precision. Consequently, both compared values are guaranteed to be identical to the last bit.

Case c) and d) may fail because the computation and subsequent comparison may be carried out with higher precision than float. The different rounding of double should be enough to fail the test.

Note that the cases a) and b) may still fail if infinities or NANs are involved, though.


Legalese

Using the N3242 C++11 working draft of the standard, I find the following:

In the text describing the assignment expression, it is explicitly stated that type conversion takes place, [expr.ass] 3:

If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.

Clause 4 refers to the standard conversions [conv], which contain the following on floating point conversions, [conv.double] 1:

A prvalue of floating point type can be converted to a prvalue of another floating point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined.

(Emphasis mine.)

So we have the guarantee that the result of the conversion is actually defined, unless we are dealing with values outside the representable range (like float a = 1e300, which is UB).

When people think about "internal floating point representation may be more precise than visible in code", they think about the following sentence in the standard, [expr] 11:

The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

Note that this applies to operands and results, not to variables. This is emphasized by the attached footnote 60:

The cast and assignment operators must still perform their specific conversions as described in 5.4, 5.2.9 and 5.17.

(I guess, this is the footnote that Maciej Piechotka meant in the comments - the numbering seems to have changed in the version of the standard he's been using.)

So, when I say float a = some_double_expression;, I have the guarantee that the result of the expression is actually rounded to be representable by a float (invoking UB only if the value is out-of-bounds), and a will refer to that rounded value afterwards.

An implementation could indeed specify that the result of the rounding is random, and thus break the cases a) and b). Sane implementations won't do that, though.



回答4:

Assuming IEEE 754 semantics, there are definitely some cases where you can do this. Conventional floating point number computations are exact whenever they can be, which for example includes (but is not limited to) all basic operations where the operands and the results are integers.

So if you know for a fact that you don't do anything that would result in something unrepresentable, you are fine. For example

float a = 1.0f;
float b = 1.0f;
float c = 2.0f;
assert(a + b == c); // you can safely expect this to succeed

The situation only really gets bad if you have computations with results that aren't exactly representable (or that involve operations which aren't exact) and you change the order of operations.

Note that the C++ standard itself doesn't guarantee IEEE 754 semantics, but that's what you can expect to be dealing with most of the time.



回答5:

Case (a) fails if a == b == 0.0. In this case, the operation yields NaN, and by definition (IEEE, not C) NaN ≠ NaN.

Cases (b) and (c) can fail in parallel computation when floating-point round modes (or other computation modes) are changed in the middle of this thread's execution. Seen this one in practice, unfortunately.

Case (d) can be different because the compiler (on some machine) may choose to constant-fold the computation of 5.0f/3.0f and replace it with the constant result (of unspecified precision), whereas a/b must be computed at runtime on the target machine (which might be radically different). In fact, intermediate calculations may be performed in arbitrary precision. I've seen differences on old Intel architectures when intermediate computation was performed in 80-bit floating-point, a format that the language didn't even directly support.



回答6:

In my humble opinion, you should not rely on the == operator because it has many corner cases. The biggest problem is rounding and extended precision. In case of x86, floating point operations can be done with bigger precision than you can store in variables (if you use coprocessors, IIRC SSE operations use same precision as storage).

This is usually good thing, but this causes problems like: 1./2 != 1./2 because one value is form variable and second is from floating point register. In the simplest cases, it will work, but if you add other floating point operations the compiler could decide to split some variables to the stack, changing their values, thus changing the result of the comparison.

To have 100% certainty you need look at assembly and see what operations was done before on both values. Even the order can change the result in non-trivial cases.

Overall what is point of using ==? You should use algorithms that are stable. This means they work even if values are not equal, but they still give the same results. The only place I know where == could be useful is serializing/deserializing where you know what result you want exactly and you can alter serialization to archive your goal.