Consider the following C++ Standard ISO/IEC 14882:2003(E) citation (section 5, paragraph 4):
Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified. 53) Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined. [Example:
i = v[i++]; // the behavior is unspecified i = 7, i++, i++; // i becomes 9 i = ++i + 1; // the behavior is unspecified i = i + 1; // the value of i is incremented
—end example]
I was surprised that i = ++i + 1
gives an undefined value of i
.
Does anybody know of a compiler implementation which does not give 2
for the following case?
int i = 0;
i = ++i + 1;
std::cout << i << std::endl;
The thing is that operator=
has two args. First one is always i
reference.
The order of evaluation does not matter in this case.
I do not see any problem except C++ Standard taboo.
Please, do not consider such cases where the order of arguments is important to evaluation. For example, ++i + i
is obviously undefined. Please, consider only my case
i = ++i + 1
.
Why does the C++ Standard prohibit such expressions?
Given two choices: defined or undefined, which choice would you have made?
The authors of the standard had two choices: define the behavior or specify it as undefined.
Given the clearly unwise nature of writing such code in the first place, it doesn't make any sense to specify a result for it. One would want to discourage code like that and not encourage it. It's not useful or necessary for anything.
Furthermore, standards committees do not have any way to force compiler writers to do anything. Had they required a specific behavior it is likely that the requirement would have been ignored.
There are practical reasons as well, but I suspect they were subordinate to the above general consideration. But for the record, any sort of required behavior for this kind of expression and related kinds will restrict the compiler's ability to generate code, to factor out common subexpressions, to move objects between registers and memory, etc. C was already handicapped by weak visibility restrictions. Languages like Fortran long ago realized that aliased parameters and globals were an optimization-killer and I believe they simply prohibited them.
I know you were interested in a specific expression, but the exact nature of any given construct doesn't matter very much. It's not going to be easy to predict what a complex code generator will do and the language attempts to not require those predictions in silly cases.
Assuming you are asking "Why is the language designed this way?".
You say that
i = ++i + i
is "obviously undefined" buti = ++i + 1
should leavei
with a defined value? Frankly, that would not be very consistent. I prefer to have either everything perfectly defined, or everything consistently unspecified. In C++ I have the latter. It's not a terribly bad choice per se - for one thing, it prevents you from writing evil code which makes five or six modifications in the same "statement".The important part of the standard is:
You modify the value twice, once with the ++ operator, once with the assignment
The following code demonstrates how you could get the wrong(unexpected) result:
It will print 1 as a result.
Please note that your copy of the standard is outdated and contains a known (and fixed) error just in 1st and 3rd code lines of your example, see:
C++ Standard Core Language Issue Table of Contents, Revision 67, #351
and
Andrew Koenig: Sequence point error: unspecified or undefined?
The topic is not easy to get just reading the standard (which is pretty obscure :( in this case).
For example, will it be well(or not)-defined, unspecified or else in general case actually depends not only on the statement structure, but also on memory contents (to be specific, variable values) at the moment of execution, another example:
Please have a look at (it has it all clear and precise):
JTC1/SC22/WG14 N926 "Sequence Point Analysis"
Also, Angelika Langer has an article on the topic (though not as clear as the previous one):
"Sequence Points and Expression Evaluation in C++"
There was also a discussion in Russian (though with some apparently erroneous statements in the comments and in the post itself):
"Точки следования (sequence points)"
It's undefined behaviour, not (just) unspecified behaviour because there are two writes to
i
without an intervening sequence point. It is this way by definition as far as the standard specifies.The standard allows compilers to generate code that delays writes back to storage - or from another view point, to resequence the instructions implementing side effects - any way it chooses so long as it complies with the requirements of sequence points.
The issue with this statement expression is that it implies two writes to
i
without an intervening sequence point:One write is for the value of the original value of
i
"plus one" and the other is for that value "plus one" again. These writes could happen in any order or blow up completely as far as the standard allows. Theoretically this even gives implementations the freedom to perform writebacks in parallel without bothering to check for simultaneous access errors.