Is this undefined C behaviour?

2019-01-06 17:06发布

Our class was asked this question by the C programming prof:

You are given the code:

int x=1;
printf("%d",++x,x+1);

What output will it always produce ?

Most students said undefined behavior. Can anyone help me understand why it is so?

Thanks for the edit and the answers but I'm still confused.

8条回答
Anthone
2楼-- · 2019-01-06 17:19

Any time the behavior of a program is undefined, anything can happen — the classical phrase is that "demons may fly out of your nose" — although most implementations don't go that far.

The arguments of a function are conceptually evaluated in parallel (the technical term is that there is no sequence point between their evaluation). That means the expressions ++x and x+1 may be evaluated in this order, in the opposite order, or in some interleaved way. When you modify a variable and try to access its value in parallel, the behavior is undefined.

With many implementations, the arguments are evaluated in sequence (though not always from left to right). So you're unlikely to see anything but 2 in the real world.

However, a compiler could generate code like this:

  1. Load x into register r1.
  2. Calculate x+1 by adding 1 to r1.
  3. Calculate ++x by adding 1 to r1. That's ok because x has been loaded into r1. Given how the compiler was designed, step 2 cannot have modified r1, because that could only happen if x was read as well as written between two sequence points. Which is forbidden by the C standard.
  4. Store r1 into x.

And on this (hypothetical, but correct) compiler, the program would print 3.

(EDIT: passing an extra argument to printf is correct (§7.19.6.1-2 in N1256; thanks to Prasoon Saurav) for pointing this out. Also: added an example.)

查看更多
Explosion°爆炸
3楼-- · 2019-01-06 17:20

The correct answer is: the code produces undefined behavior.

The reason the behavior is undefined is that the two expressions ++x and x + 1 are modifying x and reading x for an unrelated (to modification) reason and these two actions are not separated by a sequence point. This results in undefined behavior in C (and C++). The requirement is given in 6.5/2 of C language standard.

Note, that the undefined behavior in this case has absolutely nothing to do with the fact that printf function is given only one format specifier and two actual arguments. To give more arguments to printf than there are format specifiers in the format string is perfectly legal in C. Again, the problem is rooted in the violation of expression evaluation requirements of C language.

Also note, that some participants of this discussion fail to grasp the concept of undefined behavior, and insist on mixing it with the concept of unspecified behavior. To better illustrate the difference let's consider the following simple example

int inc_x(int *x) { return ++*x; }
int x_plus_1(int x) { return x + 1; }

int x = 1;
printf("%d", inc_x(&x), x_plus_1(x));

The above code is "equivalent" to the original one, except that the operations that involve our x are wrapped into functions. What is going to happen in this latest example?

There's no undefined behavior in this code. But since the order of evaluation of printf arguments is unspecified, this code produces unspecified behavior, i.e. it is possible that printf will be called as printf("%d", 2, 2) or as printf("%d", 2, 3). In both cases the output will indeed be 2. However, the important difference of this variant is that all accesses to x are wrapped into sequence points present at the beginning and at the end of each function, so this variant does not produce undefined behavior.

This is exactly the reasoning some other posters are trying to force onto the original example. But it cannot be done. The original example produces undefined behavior, which is a completely different beast. They are apparently trying to insist that in practice undefined behavior is always equivalent to unspecified behavior. This is a totally bogus claim that only indicate the lack of expertise in those who make it. The original code produces undefined behavior, period.

To continue with the example, let's modify the previous code sample to

printf("%d %d", inc_x(&x), x_plus_1(x));

the output of the code will become generally unpredictable. It can print 2 2 or it can print 2 3. However note that even though the behavior is unpredictable, it still does not produce the undefined behavior. The behavior is unspecified, bit not undefined. Unspecified behavior is restricted to two possibilities: either 2 2 or 2 3. Undefined behavior is not restricted to anything. It can format you hard drive instead of printing something. Feel the difference.

查看更多
倾城 Initia
4楼-- · 2019-01-06 17:22

Most students said undefined behavior. Can anyone help me understand why it is so?

Because order in which function parameters are calculated is not specified.

查看更多
再贱就再见
5楼-- · 2019-01-06 17:26

Echoing codaddict the answer is 2.

printf will be called with argument 2 and it will print it.

If this code is put in a context like:

void do_something()
{
    int x=1;
    printf("%d",++x,x+1);
}

Then the behaviour of that function is completely and unambiguously defined. I'm not of course arguing that this is good or correct or that the value of x is determinable afterwards.

查看更多
Emotional °昔
6楼-- · 2019-01-06 17:26

The output will be always (for 99.98% of the most important stadard compliant compilers and systems) 2.

According to the standard, this seems to be, by definition, "undefined behaviour", a definition/answer that is self-justifying and that says nothing about what actually can happen, and especially why.

The utility splint (which is not a std compliance checking tool), and so splint's programmers, consider this as "unspecified behaviour". This means, basically, that the evaluation of (x+1) can give 1+1 or 2+1, depending on when the update of x is actually done. Since however the expression is discarded (printf format reads 1 argument), the output is unaffected, and we can still say it is 2.

undefined.c:7:20: Argument 2 modifies x, used by argument 3 (order of evaluation of actual parameters is undefined): printf("%d\n", ++x, x + 1) Code has unspecified behavior. Order of evaluation of function parameters or subexpressions is not defined, so if a value is used and modified in different places not separated by a sequence point constraining evaluation order, then the result of the expression is unspecified.

As said before, the unspecified behaviour affect just the evaluation of (x+1), not the whole statement or other expressions of it. So in the case of "unspecified behaviour" we can say that the output is 2, and nobody could object.

But this is not unspecified behaviour, it seems to be "undefined behaviour". And the "undefined behaviour" seems to have to be something that affect the whole statement instead of the single expression. This is due to the mistery around where the "undefined behaviour" actually occur (i.e. what exactly affects).

If there would be motivations to attach the "undefined behaviour" just to the (x+1) expression, as in the "unspecified behaviour" case, then we still could say that the output is always (100%) 2. Attaching the "undefined behaviour" just to (x+1) means that we are not able to say if it is 1+1 or 2+1; it is just "anything". But again, that "anything" is dropped because of the printf, and this means that the answer would be "always (100%) 2".

Instead, because of misterious asymmetries, the "undefined behaviour" can't be attached just to the x+1, but indeed it must affect at least the ++x (which by the way is the responsible for the undefined behaviour), if not the whole statement. If it infects just the ++x expression, the output is a "undefined value", i.e. any integer, e.g. -5847834 or 9032. If it infects the whole statement, then you could see gargabe in your console output, likely you could have to stop the program with ctrl-c, possibly before it starts to choke your cpu.

According to an urban legend, the "undefined behaviour" infects not only the whole program, but also your computer and the laws of physics, so that misterious creatures can be created by your program and fly away or eat you.

No answers explain anything competently about the topic. They are just a "oh see the standard says this" (and it is just an interpretation, as usual!). So at least you have learned that "standards exist", and they make arid the educational questions (since of course, don't forget that your code is wrong, regardless undefined/unspecified behaviourism and other standard facts), unuseful the logic arguments and aimless the deep investigations and understanding.

查看更多
趁早两清
7楼-- · 2019-01-06 17:32

The output is likely to be 2 in every reasonable case. In reality, what you have is undefined behavior though.

Specifically, the standard says:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

There is a sequence point before evaluating the arguments to a function, and a sequence point after all the arguments have been evaluated (but the function not yet called). Between those two (i.e., while the arguments are being evaluated) there is not a sequence point (unless an argument is an expression includes one internally, such as using the && || or , operator).

That means the call to printf is reading the prior value both to determine the value being stored (i.e., the ++x) and to determine the value of the second argument (i.e., the x+1). This clearly violates the requirement quoted above, resulting in undefined behavior.

The fact that you've provided an extra argument for which no conversion specifier is given does not result in undefined behavior. If you supply fewer arguments that conversion specifiers, or if the (promoted) type of the argument disagrees with that of the conversion specifier you get undefined behavior -- but passing an extra parameter does not.

查看更多
登录 后发表回答