In C++11, does `i += ++i + 1` exhibit undefined be

2019-01-08 09:48发布

This question came up while I was reading (the answers to) So why is i = ++i + 1 well-defined in C++11?

I gather that the subtle explanation is that (1) the expression ++i returns an lvalue but + takes prvalues as operands, so a conversion from lvalue to prvalue must be performed; this involves obtaining the current value of that lvalue (rather than one more than the old value of i) and must therefore be sequenced after the side effect from the increment (i.e., updating i) (2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i; while this value computation is unsequenced w.r.t. the value computation of the RHS, this poses no problem (3) the value computation of the assignment itself involves updating i (again), but is sequenced after the value computation of its RHS, and hence after the prvious update to i; no problem.

Fine, so there is no UB there. Now my question is what if one changed the assigment operator from = to += (or a similar operator).

Does the evaluation of the expression i += ++i + 1 lead to undefined behavior?

As I see it, the standard seems to contradict itself here. Since the LHS of += is still an lvalue (and its RHS still a prvalue), the same reasoning as above applies as far as (1) and (2) are concerned; there is no undefined behavior in the evalutation of the operands on +=. As for (3), the operation of the compound assignment += (more precisely the side effect of that operation; its value computation, if needed, is in any case sequenced after its side effect) now must both fetch the current value of i, and then (obviously sequenced after it, even if the standard does not say so explicitly, or otherwise the evaluation of such operators would always invoke undefined behavior) add the RHS and store the result back into i. Both these operations would have given undefined behavior if they were unsequenced w.r.t. the side effect of the ++, but as argued above (the side effect of the ++ is sequenced before the value computation of + giving the RHS of the += operator, which value computation is sequenced before the operation of that compound assignment), that is not the case.

But on the other hand the standard also says that E += F is equivalent to E = E + F, except that (the lvalue) E is evaluated only once. Now in our example the value computation of i (which is what E is here) as lvalue does not involve anything that needs to be sequenced w.r.t. other actions, so doing it once or twice makes no difference; our expression should be strictly equivalent to E = E + F. But here's the problem; it is pretty obvious that evaluating i = i + (++i + 1) would give undefined behaviour! What gives? Or is this a defect of the standard?

Added. I have slightly modified my discussion above, to do more justice to the proper distinction between side effects and value computations, and using "evaluation" (as does the standard) of an expression to encompass both. I think my main interrogation is not just about whether behavior is defined or not in this example, but how one must read the standard in order to decide this. Notably, should one take the equivalence of E op= F to E = E op F as the ultimate authority for the semantics of the compound assignment operation (in which case the example clearly has UB), or merely as an indication of what mathematical operation is involved in determining the value to be assigned (namely the one identified by op, with the lvalue-to-rvalue converted LHS of the compound assignment operator as left operand and its RHS as right operand). The latter option makes it much harder to argue for UB in this example, as I have tried to explain. I admit that it is tempting to make the equivalence authoritative (so that compound assignments become a kind of second-class primitives, whose meaning is given by rewriting in term of first-class primitives; thus the language definition would be simplified), but there are rather strong arguments against this:

  • The equivalence is not absolute, because of the "E is evaluated only once" exception. Note that this exception is essential to avoid making any use where the evaluation of E involves a side effect undefined behavior, for instance in the fairly common a[i++] += b; usage. If fact I think no absolutely equivalent rewriting to eliminate compound assignments is possible; using a fictive ||| operator to designate unsequenced evaluations, one might try to define E op= F; (with int operands for simplicity) as equivalent to { int& L=E ||| int R=F; L = L + R; }, but then the example no longer has UB. In any case the standard gives us no rewriitng recipe.

  • The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary. For instance in 5.17 (emphasis mine)

    The assignment operator (=) and the compound assignment operators all group right-to-left. [...] In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.

  • If the intention were to let compound assignments be mere shorthands for simple assignments, there would be no reason to include them explicitly in this description. The final phrase even directly contradicts what would be the case if the equivalence was taken to be authoritative.

If one admits that compound assignments have a semantics of their own, then the point arises that their evaluation involves (apart from the mathematical operation) more than just a side effect (the assignment) and a value evaluation (sequenced after the assignment), but also an unnamed operation of fetching the (previous) value of the LHS. This would normally be dealt with under the heading of "lvalue-to-rvalue conversion", but doing so here is hard to justify, since there is no operator present that takes the LHS as an rvalue operand (though there is one in the expanded "equivalent" form). It is precisely this unnamed operation whose potential unsequenced relation with the side effect of ++ would cause UB, but this unsequenced relation is nowhere explicitly stated in the standard, because the unnamed operation is not. It is hard to justify UB using an operation whose very existence is only implicit in the standard.

9条回答
冷血范
2楼-- · 2019-01-08 10:16

The expression:

i += ++i + 1

does invoke undefined behavior. The language lawyer method requires us to go back to the defect report that results in:

i = ++i + 1 ;

becoming well defined in C++11, which is defect report 637. Sequencing rules and example disagree , it starts out saying:

In 1.9 [intro.execution] paragraph 16, the following expression is still listed as an example of undefined behavior:

i = ++i + 1;

However, it appears that the new sequencing rules make this expression well-defined

The logic used in the report is as follows:

  1. The assignment side-effect is required to be sequenced after the value computations of both its LHS and RHS (5.17 [expr.ass] paragraph 1).

  2. The LHS (i) is an lvalue, so its value computation involves computing the address of i.

  3. In order to value-compute the RHS (++i + 1), it is necessary to first value-compute the lvalue expression ++i and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the computation of the addition operation, which in turn is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.

So in this question our problem changes the RHS which goes from:

++i + 1

to:

i + ++i + 1

due to draft C++11 standard section 5.17 Assignment and compound assignment operators which says:

The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once. [...]

So now we have a situation where the computation of i in the RHS is not sequenced relative to ++i and so we then have undefined behavior. This follows from section 1.9 paragraph 15 which says:

Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

The pragmatic way to show this would be to use clang to test the code, which generates the following warning (see it live):

warning: unsequenced modification and access to 'i' [-Wunsequenced]
i += ++i + 1 ;
  ~~ ^

for this code:

int main()
{
    int i = 0 ;

    i += ++i + 1 ;
}

This is further bolstered by this explicit test example in clang's test suite for -Wunsequenced:

 a += ++a; 
查看更多
女痞
3楼-- · 2019-01-08 10:17
(i = i+1, rhs_address = &i, rhs_value = *rhs_address); int total_value = lhs_value + rhs_value; *lhs_address = total_value;

This is intended to represent my understanding of the sequencing guarantees. Note that the , operator sequences all value computations and side effects of the LHS before those of the RHS. Parentheses do not affect sequencing. In the second case, i += ++i, we have a modification of i unsequenced wrt an lvalue-to-rvalue conversion of i => UB.

The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary.

I would say that's a redundancy. The rewrite from E1 op = E2 to E1 = E1 op E2 also includes which expression types and value categories are required (on the rhs, 5.17/1 says something about the lhs), what happens to pointer types, the required conversions etc. The sad thing is that the sentence about "With respect to an.." in 5.17/1 is not in 5.17/7 as an exception of that equivalence.

In any way, I think we should compare the guarantees and requirements for compound assignment vs. simple assignment plus the operator, and see if there's any contradiction.

Once we put that "With respect to an.." also in the list of exceptions in 5.17/7, I don't think there's a contradiction.

As it turns out, as you can see in the discussion of Marc van Leeuwen's answer, this sentence leads to the following interesting observation:

int i; // global
int& f() { return ++i; }
int main() {
    i  = i + f(); // (A)
    i +=     f(); // (B)
}

It seems that (A) has an two possible outcomes, since the evaluation of the body of f is indeterminately sequenced with the value computation of the i in i + f().

In (B), on the other hand, the evaluation of the body of f() is sequenced before the value computation of i, since += must be seen as a single operation, and f() certainly needs to be evaluated before the assignment of +=.

查看更多
够拽才男人
4楼-- · 2019-01-08 10:20

About the description of i = ++i + 1

I gather that the subtle explanation is that

(1) the expression ++i returns an lvalue but + takes prvalues as operands, so a conversion from lvalue to prvalue must be performed;

Probably, see CWG active issue 1642.

this involves obtaining the current value of that lvalue (rather than one more than the old value of i) and must therefore be sequenced after the side effect from the increment (i.e., updating i)

The sequencing here is defined for the increment (indirectly, via +=, see (a)): The side effect of ++ (the modification of i) is sequenced before the value computation of the whole expression ++i. The latter refers to computing the result of ++i, not to loading the value of i.

(2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i; while this value computation is unsequenced w.r.t. the value computation of the RHS, this poses no problem

I don't think that's properly defined in the Standard, but I'd agree.

(3) the value computation of the assignment itself involves updating i (again),

The value computation of i = expr is only required when you use the result, e.g. int x = (i = expr); or (i = expr) = 42;. The value computation itself does not modify i.

The modification of i in the expression i = expr that happens because of the = is called the side effect of =. This side effect is sequenced before value computation of i = expr -- or rather the value computation of i = expr is sequenced after the side effect of the assignment in i = expr.

In general, the value computation of the operands of an expression are sequenced before the side effect of that expression, of course.

but is sequenced after the value computation of its RHS, and hence after the previous update to i; no problem.

The side effect of the assignment i = expr is sequenced after the value computation of the operands i (A) and expr of the assignment.

The expr in this case is a +-expression: expr1 + 1. The value computation of this expression is sequenced after the value computations of its operands expr1 and 1.

The expr1 here is ++i. The value computation of ++i is sequenced after the side effect of ++i (the modification of i) (B)

That's why i = ++i + 1 is safe: There's a chain of sequenced before between the value computation in (A) and the side effect on the same variable in (B).


(a) The Standard defines ++expr in terms of expr += 1, which is defined as expr = expr + 1 with expr being evaluated only once.

For this expr = expr + 1, we therefore have only one value computation of expr. The side effect of = is sequenced before the value computation of the whole expr = expr + 1, and it's sequenced after the value computation of the operands expr (LHS) and expr + 1 (RHS).

This corresponds to my claim that for ++expr, the side effect is sequenced before the value computation of ++expr.


About i += ++i + 1

Does the value computation of i += ++i + 1 involve undefined behavior?

Since the LHS of += is still an lvalue (and its RHS still a prvalue), the same reasoning as above applies as far as (1) and (2) are concerned; as for (3) the value computation of the += operator now must both fetch the current value of i, and then (obviously sequenced after it, even if the standard does not say so explicitly, or otherwise the execution of such operators would always invoke undefined behavior) perform the addition of the RHS and store the result back into i.

I think here's the problem: The addition of i in the LHS of i += to the result of ++i + 1 requires knowing the value of i - a value computation (which can mean loading the value of i). This value computation is unsequenced with respect to the modification performed by ++i. This is essentially what you say in your alternative description, following the rewrite mandated by the Standard i += expr -> i = i + expr. Here, the value computation of i within i + expr is unsequenced with respect to the value computation of expr. That's where you get UB.

Please note that a value computation can have two results: The "address" of an object, or the value of an object. In an expression i = 42, the value computation of the lhs "produces the address" of i; that is, the compiler needs to figure out where to store the rhs (under the rules of observable behaviour of the abstract machine). In an expression i + 42, the value computation of i produces the value. In the above paragraph, I was referring to the second kind, hence [intro.execution]p15 applies:

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.


Another approach for i += ++i + 1

the value computation of the += operator now must both fetch the current value of i, and then [...] perform the addition of the RHS

The RHS being ++i + 1. Computing the result of this expression (the value computation) is unsequenced with respect to the value computation of i from the LHS. So the word then in this sentence is misleading: Of course, it must first load i and then add the result of the RHS to it. But there's no order between the side-effect of the RHS and the value computation to get the value of the LHS. For example, you could get for the LHS either the old or the new value of i, as modified by the RHS.

In general a store and a "concurrent" load is a data race, which leads to Undefined Behaviour.


Addressing the addendum

using a fictive

查看更多
放我归山
5楼-- · 2019-01-08 10:24
int R=F; L = L + R; }, but then the example no longer has UB.

Let E be i and F be ++i (we don't need the + 1). Then, for i = ++i

int* lhs_address;
int lhs_value;
int* rhs_address;
int rhs_value;

    (         lhs_address = &i)
                                                                    
查看更多
戒情不戒烟
6楼-- · 2019-01-08 10:30
operator to designate unsequenced evaluations, one might try to define E op= F; (with int operands for simplicity) as equivalent to { int& L=E
查看更多
Melony?
7楼-- · 2019-01-08 10:30

Yes, it is UB!

The evaluation of your expression

i += ++i + 1

proceeds in the following steps:

5.17p1 (C++11) states (emphases mine):

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

What does "value computation" mean?

1.9p12 gives the answer:

Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.

Since your code uses a compound assignment operator, 5.17p7 tells us, how this operator behaves:

The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except thatE1 is evaluated only once.

Hence the evaluation of the expression E1 ( == i) involves both, determining the identity of the object designated by i and an lvalue-to-rvalue conversion to fetch the value stored in that object. But the evaluation of the two operands E1 and E2 are not sequenced with respect to each other. Thus we get undefined behavior since the evaluation of E2 ( == ++i + 1) initiates a side effect (updating i).

1.9p15:

... If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.


The following statements in your question/comments seem to be the root of your misunderstanding:

(2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i

fetching a value can be part of a prvalue evaluation. But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E

If an expression is an lvalue or rvalue doesn't tell anything about how this expression is to be evaluated. Some operators require lvalues as their operands some others require rvalues.

Clause 5p8:

Whenever a glvalue expression appears as an operand of an operator that expects a prvalue for that operand, the lvalue-to-rvalue (4.1), array-to-pointer (4.2), or function-to-pointer (4.3) standard conversions are applied to convert the expression to a prvalue.

In a simple assignment the evaluation of of the LHS only requires determining the identity of the object. But in a compound assignment such as += the LHS must be a modifiable lvalue, but the evaluation of the LHS in this case consists of determining the identity of the object and an lvalue-to-rvalue conversion. It is the result of this conversion (which is a prvalue) that is added to the result (also a prvalue) of the evaluation of the RHS.

"But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E"

That's not true as I explained above. In your example F is a prvalue expression, but F may as well be an lvalue expression. In that case, the lvalue-to-rvalue conversion is also applied to F. 5.17p7 as cited above tells us, what the semantics of the compound assignment operators are. The standard states that the behavior of E += F is the same as of E = E + F but E is only evaluated once. Here, the evaluation of E includes the lvalue-to-rvalue conversion, because the binary operator + requires it operands to be rvalues.

查看更多
登录 后发表回答