Is the behaviour of the compiler undefined, with U

2019-02-10 18:39发布

When I answered this question, I wrote:

First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined.

But there was disagreement in a comment, so I want to ask the question here:

If the source code contains Undefined Behaviour, is it only the behaviour of the translated machine code that is undefined, or is the behaviour of the compiler undefined, too?

The standard defines the behaviour of an abstract machine (1.9):

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

Maybe the question is if the compiler is a part of that machine, and if yes, if that part is allowed to behave in an undefined way?


A more practical version of this question would be:
Assume a compiler would crash or not produce any output when it finds UB on all control paths, like in this program:

int main() {
    complex_things_without_UB();
    int x = 42;
    x = x++;  //UB here
    return x;
}

but otherwise it would always produce correct binaries. Would this still be a standard-compliant compiler?

7条回答
Explosion°爆炸
2楼-- · 2019-02-10 18:44

The C++ standard defines behavior for code, it doesn't define behavior for the compiler. As such, it doesn't really make sense to refer to undefined behavior of the compiler -- it was never well-defined to begin with. The only requirement is that it produces an implementation that conforms to the standard guidelines for the code. How it does this is an implementation detail.

查看更多
叼着烟拽天下
3楼-- · 2019-02-10 18:46

is it only the behaviour of the translated machine code that is undefined, or is the behaviour of the compiler undefined, too?

The ISO C and C++ describe what a C and C++ program look like. They do not describe the environment they run in. We generally use the term compiler to refer to the tool that translates C and C++ into machine code; formally, however, the term used is implementation which is definitely wider.

Therefore, the only behavior which is undefined is the one of the program. This is also given by the definition of UB:

undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

查看更多
太酷不给撩
4楼-- · 2019-02-10 18:51

There is no compiler mentioned in the standard and implementation details are up to the vendors.

The standard defines how code should behave (in a syntactical and semantical way) and/or be constrained in complexity terms regarding some standard library algorithms. The source code doesn't have to have a precise behavior (nor this is defined anywhere). Every compiler just has to produce code that, under the as-if rule, is correct.

It doesn't make sense to refer to undefined behavior of the compiler

查看更多
姐就是有狂的资本
5楼-- · 2019-02-10 19:03

That's a pretty blurry line as a whole. The point is that the source code does not have a defined behaviour, which means the behaviour of the generated code is not well defined.

The compiler should, by all accounts behave in some defined way - but of course, that could be rather "random" (e.g. the compiler may choose to insert a random number into your calculation - or even a call to rand - and it's still perfectly within the rights of the compiler). There are certainly cases where the compiler (ab)uses the fact that it knows something is undefined to make optimisations.

I would consider it a very poor implementation of the compiler if, for example, the compiler crashes or causes the hard-disk to be formatted, but I believe the compiler may be still "right" if it says "This is undefined, I refuse to compile it" [in some manner].

Of course, there are (quite a lot of) situations where something is undefined, not because the construct itself is undefined, but because it's "hard to define a single behaviour that is possible to implement in many places" - for example, using an invalid pointer (int* p = (int*) rand(); or use-after-free) is undefined, but the compiler may not know and understand if it's correct or not. Instead, it's up to the processor architecture what happens if you use a pointer at a random address, or after it has been freed. Both cases may result in a crash on one machine, not a crash, but an erroneous result on another, and in some cases "you won't notice that anything is wrong". This is clearly not the compiler's behaviour that is undefined, but the resulting program.

查看更多
啃猪蹄的小仙女
6楼-- · 2019-02-10 19:04

Assuming that "undefined behaviour for a compiler" means "there are no requirements on the behaviour of the executable program produced" then the behaviour of the compiler is undefined when presented with source code containing undefined behaviour constructs.

Compare this with the behaviour of the compiler with correct source code. All compilers adhering to the standard must produce executable code with equivalent behaviour, the one defined by the standard for the correct source code.

查看更多
唯我独甜
7楼-- · 2019-02-10 19:09

My own take is that the behavior in "undefined behavior" is that of the implementation. The spec refers to a process of "translation" that we might equate with compilation, but the fact that you can compile a program to executable code is not relevant here, the result is still considered to be part of the implementation, at least in as far as behavior is concerned. Note that while the spec does define how a C program will behave, when it places requirements these are on the implementation, and the behavior of a program can also be considered a requirement (or set of requirements) on the implementation.

In any case, undefined behaviour can certainly refer to behavior of the compiler. See the note in C11 3.4.3:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

"Terminating translation" clearly refers to a compilation failure whereas "terminating a ... execution" clearly refers to behavior of a running program.

See also Appendix J.2 which lists examples of undefined behavior. Amongst the examples are:

A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment (5.1.1.2)

It seems ridiculous that this should cause undefined behavior at execution time rather than at translation time. There are various other similar examples. The entire set clearly shows cases where undefined behaviour can occur at both compile time and run time.

查看更多
登录 后发表回答