Why don't modern C++ compilers optimize away s

2020-03-08 09:46发布

问题:

When I compile and run this code with Clang (-O3) or MSVC (/O2)...

#include <stdio.h>
#include <time.h>

static int const N = 0x8000;

int main()
{
    clock_t const start = clock();
    for (int i = 0; i < N; ++i)
    {
        int a[N];    // Never used outside of this block, but not optimized away
        for (int j = 0; j < N; ++j)
        {
            ++a[j];  // This is undefined behavior (due to possible
                     // signed integer overflow), but Clang doesn't see it
        }
    }
    clock_t const finish = clock();
    fprintf(stderr, "%u ms\n",
        static_cast<unsigned int>((finish - start) * 1000 / CLOCKS_PER_SEC));
    return 0;
}

... the loop doesn't get optimized away.

Furthermore, neither Clang 3.6 nor Visual C++ 2013 nor GCC 4.8.1 tells me that the variable is uninitialized!

Now I realize that the lack of an optimization isn't a bug per se, but I find this astonishing given how compilers are supposed to be pretty smart nowadays. This seems like such a simple piece of code that even liveness analysis techniques from a decade ago should be able to take care of optimizing away the variable a and therefore the whole loop -- never mind the fact that incrementing the variable is already undefined behavior.

Yet only GCC is able to figure out that it's a no-op, and none of the compilers tells me that this is an uninitialized variable.

Why is this? What's preventing simple liveness analysis from telling the compiler that a is unused? Moreover, why isn't the compiler detecting that a[j] is uninitialized in the first place? Why can't the existing uninitialized-variable-detectors in all of those compilers catch this obvious error?

回答1:

The undefined behavior is irrelevant here. Replacing the inner loop with:

    for (int j = 1; j < N; ++j)
    {
        a[j-1] = a[j];
        a[j] = j;
    }

... has the same effect, at least with Clang.

The issue is that the inner loop both loads from a[j] (for some j) and stores to a[j] (for some j). None of the stores can be removed, because the compiler believes they may be visible to later loads, and none of the loads can be removed, because their values are used (as input to the later stores). As a result, the loop still has side-effects on memory, so the compiler doesn't see that it can be deleted.

Contrary to n.m.'s answer, replacing int with unsigned does not make the problem go away. The code generated by Clang 3.4.1 using int and using unsigned int is identical.



回答2:

It's an interesting issue with regards to optimizing. I would expect that in most cases, the compiler would treat each element of the array as an individual variable when doing dead code analysis. Ans 0x8000 make too many individual variables to track, so the compiler doesn't try. The fact that a[j] doesn't always access the the same object could cause problems as well for the optimizer.

Obviously, different compilers use different heuristics; a compiler could treat the array as a single object, and detect that it never affected output (observable behavior). Some compilers may choose not to, however, on the grounds that typically, it's a lot of work for very little gain: how often would such optimizations be applicable in real code?



回答3:

++a[j];  // This is undefined behavior too, but Clang doesn't see it

Are you saying this is undefined behavior because the array elements are uninitialized?

If so, although this is a common interpretation of clause 4.1/1 in the standard I believe it is incorrect. The elements are 'uninitialized' in the sense that programmers usually use this term, but I do not believe this corresponds exactly to the C++ specification's use of the term.

In particular C++11 8.5/11 states that these objects are in fact default initialized, and this seems to me to be mutually exclusive with being uninitialized. The standard also states that for some objects being default initialized means that 'no initialized is performed'. Some might assume this means that they are uninitialized but this is not specified and I simply take it to mean that no such performance is required.

The spec does make clear that the array elements will have indeterminant values. C++ specifies, by reference to the C standard, that indeterminant values can be either valid representations, legal to access normally, or trap representations. If the particular indeterminant values of the array elements happen to all be valid representations, (and none are INT_MAX, avoiding overflow) then the above line does not trigger any undefined behavior in C++11.

Since these array elements could be trap representations it would be perfectly conformant for clang to act as though they are guaranteed to be trap representations, effectively choosing to make the code UB in order to create an optimization opportunity.

Even if clang doesn't do that it could still choose to optimize based on the dataflow. Clang does know how to do that, as demonstrated by the fact that if the inner loop is changed slightly then the loops do get removed.

So then why does the (optional) presence of UB seem to stymie optimization, when UB is usually taken as an opportunity for more optimization?

What may be going on is that clang has decided that users want int trapping based on the hardware's behavior. And so rather than taking traps as an optimization opportunity, clang has to generate code which faithfully reproduces the program behavior in hardware. This means that the loops cannot be eliminated based on dataflow, because doing so might eliminate hardware traps.


C++14 updates the behavior such that accessing indeterminant values itself produces undefined behavior, independent of whether one considers the variable uninitialized or not: https://stackoverflow.com/a/23415662/365496



回答4:

That is indeed very interesting. I tried your example with MSVC 2013. My first idea was that the fact that the ++a[j] is somewhat undefined is the reason why the loop is not removed, because removing this would definetly change the meaning of the program from an undefined/incorrect semantic to something meaningful, so I tried to initialize the values before but the loops still did not dissappear.

Afterwards I replaced the ++a[j]; with an a[j] = 0; which then produced an output without any loop so everything between the two calls to clock() was removed. I can only guess about the reason. Perhaps the optimizer is not able to prove that the operator++ has no side effects for any reason.