Symmetrical Lerp & compiler optimizations

2019-07-12 23:12发布

问题:

I had a function:

float lerp(float alpha, float x0, float x1) {
    return (1.0f - alpha) * x0 + alpha * x1;
}

For those who haven't seen it, this is preferable to x0 + (x1-x0) * alpha because the latter doesn't guarantee that lerp(1.0f, x0, x1) == x1.

Now, I want my lerp function to have an additional property: I'd like lerp(alpha, x0, x1) == lerp(1-alpha, x1, x0). (As for why: this is a toy example of a more complicated function.) The solution I came up with that seems to work is

float lerp_symmetric(float alpha, float x0, float x1) {
    float w0 = 1.0f - alpha;
    float w1 = 1.0f - w0;
    return w0 * x0 + w1 * x1;
}

This double subtraction has the effect of rounding near zero and near one, so if alpha = std::nextafter(0) (1.4012985e-45), then 1 - alpha == 1 and so 1 - (1-alpha) == 0. As far as I can tell, it is always true that 1.0f - x == 1.0f - (1.0f - (1.0f - x)). It also seems to have the effect that w0 + w1 == 1.0f.

Questions:

  1. Is this a reasonable approach?
  2. Can I trust my compiler to do what I want? In particular, I know on Windows it sometimes uses more precision for partial results, and I know the compiler is allowed to do some algebra; obviously 1-(1-x)==x algebraically.

This is in C++11 using Clang, VisualStudio, and gcc.

回答1:

If one format of IEEE-754 binary floating-point is used throughout (e.g., basic 32-bit binary, the format commonly used for C++ float), with all C++ operators mapped to IEEE-754 operations in the direct and simple way, then lerp_symmetric(alpha, x0, x1) (hereafter referred to as A) equals lerp_symmetric(1-alpha, x1, x0) (B)

Proof:

  • If alpha, which we assume is in [0, 1], is greater than or equal to ½, then 1-alpha is exact by Sterbenz’ lemma. (By “exact,” we mean the computed floating-point result equals the mathematical resulting; there is no rounding error.) Then, in computing A, w0 is exact since it is 1-alpha, and w1 is exact since its mathematical result is alpha, so it is exactly representable. And, in computing B, w0 is exact since its mathematical result is alpha, and w1 is exact since it is again 1-alpha.
  • If alpha is less than ½, then 1-alpha may have some rounding error. Let the result be beta. Then, in A, w0 is beta. Now ½ ≤ beta, so Sterbenz’ lemma applies to the evaluation of w1 = 1.0f - w0, so w1 is exact (and equals the mathematical result of 1-beta). And, in B, w0 is exact, again by Sterbenz' lemma, and equals the w1 of A, and w1 (of B) is exact since its mathematical result is beta, which is exactly representable.

Now we can see that w0 in A equals w1 in B and w1 in A equals w0 in B. Letting beta be 1-alpha in either of the above cases, A and B therefore return (1-beta)*x0 + beta*x1 and beta*x1 + (1-beta)*x0, respectively. IEEE-754 addition is commutative (except for NaN payloads), so A and B return identical results.

Answering the questions:

  1. I would say it is a reasonable approach. I would not assert there are not improvements that could be made without further thought.

  2. No, you cannot trust your compiler:

    • C++ allows implementations to use excess precision when evaluating floating-point arithmetic. Thus w0*x0 + w1*x1 may be evaluated using double, long double, or another precision even though all operands are float.
    • C++ allows contractions unless disabled, so w0*x0 + w1*x1 may be evaluated as fmaf(w0, x0, w1*x1), thus using exact arithmetic for one of the multiplications but not the other.

You can partially work around this by using:

float w0 = 1.0f - alpha;
float w1 = 1.0f - w0;
float t0 = w0*x0;
float t1 = w1*x1;
return t0+t1;

The C++ standard requires that excess precision be discarded in assignments and casts. This extends to function returns. (I report this and other C++ specifications from memory; the standard should be checked.) So each of the above will round its result to float even if extra precision was initially used. This will prevent contraction.

(One should also be able to disable contraction by including <cmath> and inserting the preprocessor directive #pragma STDC FP_CONTRACT off. Some compilers might not support that.)

One problem with the workaround above is that values are first rounded to the evaluation precision and then rounded to float. There are mathematical values for which, for such a value x, rounding x first to double (or another precision) and then to float produces a different result than rounding x directly to float. The dissertation A Rigorous Framework for Fully Supporting the IEEE Standard for Floating-Point Arithmetic in High-Level Programming Languages by Samuel A. Figueroa del Cid establishes that evaluating a single operation of multiplication or addition in IEEE-754 basic 64-bit floating-point (commonly used for double) and then rounding to the 32-bit format never has a double-rounding error (because these operations, given inputs that are elements of the 32-bit format, can never produce one of the troublesome x values described above).1

If I am correct about the C++ specifications reported from memory, then the workaround describe above should be complete as long as the C++ implementation evaluates floating-point expressions either with the nominal format or with a format sufficiently wider to satisfy the requirements Figueroa del Cid gives.

Footnote

1 Per Figueroa del Cid, if x and y have p-bit significands, and x+y or x*y is computed exactly and then rounded to q places, a second rounding to p places will have the same answer as if the result were directly rounded to p places if p ≤ (q1)/2. This is satisfied for IEEE-754 basic 32-bit binary floating-point (p = 24) and 64-bit (q = 53). These formats are commonly used for float and double, and the workaround above should suffice in a C++ implementation that uses them. If a C++ implementation evaluated float using a precision that did not satisfy the condition Figueroa del Cid gives, then double-rounding errors could occur.