Clang reciprocal to 1 optimisations

2019-07-04 19:29发布

问题:

After a discussion with colleagues, I ended up testing wether if clang would optimize two divisions, with a reciprocal to 1, to a single division.

const float x = a / b; //x not used elsewhere const float y = 1 / x;

Theoretically clang could optimize to const float y = b / a if x is used only as a temporary step value, no?

Here's the input&output of a simple test case: https://gist.github.com/Jiboo/d6e839084841d39e5ab6 (in both ouputs you can see that it's performing the two divisions, instead of optimizing)

This related question, is behind my comprehension and seem to focus only on why a specific instruction isn't used, whereas in my case it's the optimisation that isn't done: Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

Thanks, JB.

回答1:

No, clang can not do that.

But first, why are you using float? float has six digits precision, double has 15. Unless you have a good reason, that you can explain, use double.

1 / (a / b) in floating-point arithmetic is not the same as b / a. What the compiler has to do, is in the first case:

  1. Divide a by b
  2. Round the result to the nearest floating-point number
  3. Divide 1 by the result
  4. Round the result to the nearest floating-point number.

In the second case:

  1. Divide b by a.
  2. Round the result to the nearest floating-point number.

The compiler can only change the code if the result is guaranteed to be the same, and if the compiler writer cannot produce a mathematical proof that the result is the same, the compiler cannot change the code. There are two rounding operations in the first case, rounding different numbers, so it is unlikely that the result can be guaranteed to be the same.



回答2:

The compiler doesn't think like a mathematician. Where you think simplifying the expression is trivial mathematically, the compiler has a lot of other things to consider. It is actually quite likely that the compiler is much smarter than the programmer and also knows far more about the C standard.

Something like this is probably what goes through the optimizing compiler's "mind":

  • Ah they wrote a / b but only use x at one place, so we don't have to allocate that variable on the stack. I'll remove it and use a CPU register.
  • Hmm, integer literal 1 divided with a float variable. Okay, we have to invoke balancing here before anything else and turn that literal into a float 1.0f.
  • The programmer is counting on me to generate code that contains the potential floating point inaccuracy involved in dividing 1.0f with another float variable! So I can't just swap this expression with b / a because then that floating point inaccuracy that the programmer seems to want here would be lost.

And so on. There's a lot of considerations. What machine code you end up with is hard to predict in advance. Just know that the compiler follows your instructions to the letter.