Why compilers no longer optimize this UB with stri

2019-06-15 00:57发布

One of the first results for strict aliasing on google is this article http://dbp-consulting.com/tutorials/StrictAliasing.html
One interesting thing I noticed is this: http://goo.gl/lPtIa5

uint32_t swaphalves(uint32_t a) {
  uint32_t acopy = a;
  uint16_t* ptr = (uint16_t*)&acopy;
  uint16_t tmp = ptr[0];
  ptr[0] = ptr[1];
  ptr[1] = tmp;
  return acopy;
}

is compiled to

swaphalves(unsigned int):
        mov     eax, edi
        ret

by GCC 4.4.7. Any compiler newer than that (4.4 is mentioned in the article so article is not wrong) does not implement the function as it could using strict aliasing. What is the reason for this? Was it in fact bug in GCC or GCC decided to drop it since many lines of code were written in a way that thay produce UB or it is just a compiler regression that lasts for years... Also Clang does not optimize it.

3条回答
\"骚年 ilove
2楼-- · 2019-06-15 01:15

In this other related question, there is a comment by @DanMoulding. Let me plagiarize it:

The intent of the standard's strict aliasing rules are to allow the compiler to optimize in situations where it doesn't and cannot know whether an object is being aliased. The rules permit the optimizer to not make worst-case aliasing assumptions in those situations. However, when it is clear from the context that an object is being aliased, then the compiler should treat the object as being aliased, no matter what types are being used to access it. Doing otherwise is not in line with the intent of the language's aliasing rules.

In your code, the aliasing of *ptr and acopy is obvious, as both are local variables, so any sane compiler should treat them as aliased. From this point of view the GCC 4.4 behaviour, although in line with a strict reading of the standard, will be considered a bug by most real-world programmers.

You have to consider why there are aliasing rules in the first place. They are so that the compiler may take advantage of optimizations in situations where there might be be aliasing, but most likely there are none. So the language forbids that aliasing and the compiler is free to optimize. For example:

void foo(int *idx, float *data)
{ /* idx and data do not overlap */ }

However, when the aliasing involves local variables, there are no lost optimizations:

void foo()
{
    uint32_t x;
    uint16_t *p = (uint16_t *)&x; //x and p do overlap!
}

The compiler is trying to do its job as best as possible, not trying to find an UB somewhere to have an excuse to format your hard drive!

There are a lot of code that is technically UB but is ignored by all compilers. For example, what would you think of a compiler that treats this as an empty file:

#ifndef _FOO_H_
#define _FOO_H_
void foo(void);
#endif

Or what about a compiler that ignores this macro:

#define new DEBUG_NEW

simply because the standard allows it to do so?

查看更多
狗以群分
3楼-- · 2019-06-15 01:21

The goal for a compiler should generally be matching the intent of the code as closely as possible. In this case, the code invokes UB, but the intent should be pretty clear. My guess is that more recently compilers have been focusing on being correct than taking advantage of UB for optimization purposes.

Strict aliasing is essentially an assumption that code isn't trying to subvert the type system, which as noted by @rodrigo, gives the compiler more information it can use to optimize. If the compiler can't assume strict aliasing, it precludes a number of non-trivial optimizations, which is why C even added a restrict qualifier (C99).

Breaking strict aliasing isn't necessary for any optimizations I can think of. In fact, in this specific case, depending on what the original intent was, you can get correct/optimized code without invoking UB...

uint32_t wswap(uint32_t ws) {
  return (ws << 16) | (ws >> 16);
}

compiles to...

wswap:                                  # @wswap
    .cfi_startproc
# BB#0:
    roll    $16, %edi
    movl    %edi, %eax
    retq
查看更多
闹够了就滚
4楼-- · 2019-06-15 01:33

The GCC developers put some effort into making the compiler behave "as expected" in these cases. (I wish I could give you a proper reference for this - I remember it coming up on a mailing list or somesuch at some time).

At any rate, something you say:

... does not implement the function as it could using strict aliasing

... implies perhaps a slight misunderstanding of what the strict aliasing rules are for. Your code sample invokes undefined behavior - so any compilation is technically valid, including just a plain ret or generation of a trap instruction, or even nothing at all (it's legitimate to assume the method can never be called). That newer versions of GCC produce longer/slower code is hardly a deficiency, since producing code that does any particular thing at all would not violate the standard. In fact, the newer versions improve the situation by producing code that does what the programmer probably intended the code to do instead of silently doing something differerent.

What would you rather - that the compiler produces fast code that doesn't do what you want, or slightly slower code that does do what you want?

That being said, I firmly believe that you should not write code that breaks the strict aliasing rules. Relying on the compiler doing the "right" thing when it is "obvious" what is intended is walking a tightrope. Optimisation is hard enough already, without the compiler having to guess at - and make allowances for - what the programmer intended. Further, it's possible to write code which obeys the rules and which can be turned into very efficient object code by the compiler. Indeed the further question can be raised:

Why did earlier versions of GCC behave the way they did, and "optimize" the function by relying on adherence to the strict aliasing rules?

That's a little bit complicated, but is interesting for this discussion (especially in light of suggestions that the compiler is going to some lengths just to break code). Strict aliasing is a component of (or rather, a rule which assists) a process called alias analysis. This process decides whether two pointers alias or not. There are, essentially, 3 possible conditions between any two pointers:

  • They MUST NOT ALIAS (the strict aliasing rule makes it easy to deduce this condition, though it can sometimes be deduced in other ways).
  • They MUST ALIAS (this requires analysis; value propagation might detect this condition for instance)
  • They MAY ALIAS. This is the default condition when neither of the other two conditions can be established.

In the case of the code in your question, strict aliasing implies a MUST NOT ALIAS condition between &acopy and ptr (it is trivial to make this determination, because the two values have incompatible types which are not allowed to alias). This condition allows for the optimisation that you then see: all the manipulation of *ptr values can be discarded because they cannot in theory effect the value of acopy and they do not otherwise escape the function (which can be determined via escape analysis).

It takes further effort to determine the MUST ALIAS condition between the two pointers. Furthermore, in doing so the compiler would need to ignore (at least temporarily) the previously ascertained MUST NOT ALIAS condition, which means it must spend time attempting to ascertain the truth of a condition which, if everything is as it should be, must be false.

When both MUST NOT ALIAS and MUST ALIAS conditions are determined, we have a case where the code must be invoking undefined behaviour (and we can issue a warning). We then have to decide which condition to keep and which to discard. Because MUST NOT ALIAS, in this case, comes from a constraint which can be (and indeed has been) broken by the user, it is the best option to discard.

So, the older versions of GCC either do not do the requisite analysis to determine the MUST ALIAS condition (perhaps because the opposite MUST NOT ALIAS condition has already been established), or alternatively, the older GCC version opts to discard the MUST ALIAS condition in preference to the MUST NOT ALIAS condition, which leads to faster code which does not do what the programmer most likely intended. In either case, it seems that the newer versions offer an improvement.

查看更多
登录 后发表回答