Is while(1); undefined behavior in C?

2019-01-14 15:46发布

问题:

In C++11 is it Undefined Behavior, but is it the case in C that while(1); is Undefined Behavior?

回答1:

It is well defined behavior. In C11 a new clause 6.8.5 ad 6 has been added

An iteration statement whose controlling expression is not a constant expression,156) that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate.157)


157)This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

Since the controlling expression of your loop is a constant, the compiler may not assume the loop terminates. This is intended for reactive programs that should run forever, like an operating system.

However for the following loop the behavior is unclear

a = 1; while(a);

In effect a compiler may or may not remove this loop, resulting in a program that may terminate or may not terminate. That is not really undefined, as it is not allowed to erase your hard disk, but it is a construction to avoid.

There is however another snag, consider the following code:

a = 1; while(a) while(1);

Now since the compiler may assume the outer loop terminates, the inner loop should also terminate, how else could the outer loop terminate. So if you have a really smart compiler then a while(1); loop that should not terminate has to have such non-terminating loops around it all the way up to main. If you really want the infinite loop, you'd better read or write some volatile variable in it.

Why this clause is not practical

It is very unlikely our compiler company is ever going to make use of this clause, mainly because it is a very syntactical property. In the intermediate representation (IR), the difference between the constant and the variable in the above examples is easily lost through constant propagation.

The intention of the clause is to allow compiler writers to apply desirable transformations like the following. Consider a not so uncommon loop:

int f(unsigned int n, int *a)
{       unsigned int i;
        int s;

        s = 0;
        for (i = 10U; i <= n; i++)
        {
                s += a[i];
        }
        return s;
}

For architectural reasons (for example hardware loops) we would like to transform this code to:

int f(unsigned int n, int *a)
{       unsigned int i;
        int s;

        s = 0;
        for (i = 0; i < n-9; i++)
        {
                s += a[i+10];
        }
        return s;
}

Without clause 6.8.5 ad 6 this is not possible, because if n equals UINT_MAX, the loop may not terminate. Nevertheless it is pretty clear to a human that this is not the intention of the writer of this code. Clause 6.8.5 ad 6 now allows this transformation. However the way this is achieved is not very practical for a compiler writer as the syntactical requirement of an infinite loop is hard to maintain on the IR.

Note that it is essential that n and i are unsigned as overflow on signed int gives undefined behavior and thus the transformation can be justified for this reason. Efficient code however benefits from using unsigned, apart from the bigger positive range.

An alternative approach

Our approach would be that the code writer has to express his intention by for example inserting an assert(n < UINT_MAX) before the loop or some Frama-C like guarantee. This way the compiler can "prove" termination and doesn't have to rely on clause 6.8.5 ad 6.

P.S: I'm looking at a draft of April 12, 2011 as paxdiablo is clearly looking at a different version, maybe his version is newer. In his quote the element of constant expression is not mentioned.



回答2:

After checking in the draft C99 standard, I would say "no", it's not undefined. I can't find any language in the draft that mentions a requirement that iterations end.

The full text of the paragraph describing the semantics of the iterating statements is:

An iteration statement causes a statement called the loop body to be executed repeatedly until the controlling expression compares equal to 0.

I would expect any limitation such as the one specififed for C++11 to appear there, if applicable. There is also a section named "Constraints", which also doesn't mention any such constraint.

Of course, the actual standard might say something else, although I doubt it.



回答3:

The simplest answer involves a quote from §5.1.2.3p6, which states the minimal requirements of a conforming implementation:

The least requirements on a conforming implementation are:

— Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.

— At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.

— The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

This is the observable behavior of the program.

If the machine code fails to produce the observable behaviour due to optimisations performed, then the compiler isn't a C compiler. What is the observable behaviour of a program that contains only such an infinite loop, at the point of termination? The only way such a loop could end is by a signal causing it to end prematurely. In the case of SIGTERM, the program terminates. This would cause no observable behaviour. Hence, the only valid optimisation of that program is the compiler pre-empting the system closing the program and generating a program that ends immediately.

/* unoptimised version */
int main() {
    for (;;);
    puts("The loop has ended");
}

/* optimised version */
int main() { }

One possibility is that a signal is raised and longjmp is called to cause execution to jump to a different location. It seems like the only place that could be jumped to is somewhere reached during execution prior to the loop, so providing the compiler is intelligent enough to notice that a signal is raised causing the execution to jump to somewhere else, it could potentially optimise the loop (and the signal raising) away in favour of jumping immediately.

When multiple threads enter the equation, a valid implementation might be able to transfer ownership of the program from the main thread to a different thread, and end the main thread. The observable behaviour of the program must still be observable, regardless of optimisations.



回答4:

The following statement appears in C11 6.8.5 Iteration statements /6:

An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate.

Since while(1); uses a constant expression, the implementation is not allowed to assume it will terminate.

A compiler is free to remove such a loop entirely is the expression is non-constant and all other conditions are similarly met, even if it cannot be proven conclusively that the loop would terminate.