I've been encountered on a case where cross-platform code was behaving differently on a basic assignment statement.
One compiler evaluated the Lvalue first, Rvalue second and then the assignment.
Another compiler did the Rvalue first, Lvalue second and then the assignment.
This may have impact in case Lvalue influence the value of Rvalue as shown in the following case:
struct MM {
int m;
}
int helper (struct MM** ppmm ) {
(*ppmm) = (struct MM *) malloc (sizeof (struct MM));
(*ppmm)->m = 1000;
return 100;
}
int main() {
struct MM mm = {500};
struct MM* pmm = &mm
pmm->m = helper(&pmm);
printf(" %d %d " , mm.m , pmm->m);
}
The example above, the line pmm->m = helper(&mm);
, depend on the order of evaluation. if Lvalue evaluated first, than pmm->m is equivalent to mm.m, and if Rvalue calculated first than pmm->m is equivalent to the MM instance that allocated on heap.
My question is whether there's a C standard to determine the order of evaluation (didn't find any), or each compiler can choose what to do.
are there any other similar pitfalls I should be aware of ?
The semantics for evaluation of an =
expression include that
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(C2011, 6.5.16/3; emphasis added)
The emphasized provision explicitly permits your observed difference in the behavior of the program when compiled by different compilers. Moreover, unsequenced means, among other things, that it is permissible for the evaluations to occur in different order even in different runs of the very same build of the program. If the function in which the unsequenced evaluations appear were called more than once, then it would be permissible for the evaluations to occur in different order during different calls within the same execution of the program.
That already answers the question, but it's important to see the bigger picture. Modifying an object or calling a function that does so is a side effect (C2011, 5.1.2.3/2). This key provision therefore comes into play:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
(C2011, 6.5/2)
The called function has the side effect of modifying the value stored in main()
's variable pmm
, evaluation of the left-hand operand of the assignment involves a value computation using the value of pmm
, and these are unsequenced, therefore the behavior is undefined.
Undefined behavior is to be avoided at all costs. Because your program's behavior is undefined, is not limited to the two alternatives you observed (in case that wasn't bad enough). The C standard places no limitations whatever on what it may do. It might instead crash, zero out your hard drive's partition table, or, if you have suitable hardware, summon nasal demons. Or anything else. Most of these are unlikely, but the best viewpoint is that if your program has undefined behavior then your program is wrong.
When using the simple assignment operator: =
, the order of evaluation of operands is unspecified. There is also no sequence point in between the evaluations.
For example if you have two functions:
*Get() = logf(2.0f);
It is not specified in which order they are called at any time, and yet this behavior is completely defined.
A function call will introduce a sequence point. It will happen after the evaluation of the arguments and before the actual call. The operator ;
will also introduce a sequence point. This is important because an object must not be modified twice without an intervening sequence point, otherwise the behavior is undefined.
Your example is particularly complicated due to unspecified behavior, and may have different results, depending the left or right operand is evaluated first.
- The left operand is evaluated first.
The left operand is evaluated and the pointer pmm
will point to the struct mm
. Then the function is called, and a sequence point occurs. it modifies the pointer pmm
by pointing it to allocated memory, followed by a sequence point because of the operator ;
. Then it stores the value 1000 to the member m
, followed by another sequence point because of ;
. The function returns 100 and assigns it to the left operand, but since the left operand was evaluated first, the value 100, it is assigned to the object mm
, more specifically its member m
.
mm->m
has the value 100 and ppm->m
has the value 1000. This is defined behavior, no object is modified twice in-between sequence points.
- The right operand is evaluated first.
The function is called first, the sequence point occurs, it modifies the pointer ppm
by pointing it to new allocated struct, followed by a sequence point. Then it stores the value 1000 to the member m
, followed by a sequence point. Then the function returns. Then the left operand is evaluated, ppm->m
will point to the new allocated struct, and its member m
, is modified by assigning it the value 100.
mm->m
will have the value 500 since it was never modified, and pmm->m
will have the value 100. No object was modified twice in-between sequence points. The behavior is defined.