Suppose A
, B
, a
, and b
are all variables, and the addresses of A
, B
, a
, and b
are all different. Then, for the following code:
A = a;
B = b;
Do the C and C++ standard explicitly require A=a
be strictly executed before B=b
? Given that the addresses of A
, B
, a
, and b
are all different, are compilers allowed to swap the execution sequence of two statements for some purpose such as optimization?
If the answer to my question is different in C and C++, I would like to know both.
Edit: The background of the question is the following. In board game AI design, for optimization people use lock-less shared-hash table, whose correctness strongly depends on the execution order if we do not add volatile
restriction.
Both standards allow for these instructions to be performed out of order, so long as that does not change observable behaviour. This is known as the as-if rule:
- What exactly is the "as-if" rule?
- http://en.cppreference.com/w/cpp/language/as_if
Note that as is pointed out in the comments, what is meant by "observable behaviour" is the observable behaviour of a program with defined behaviour. If your program has undefined behaviour, then the compiler is excused from reasoning about that.
The compiler is only obligated to emulate the observable behavior of a program, so if a re-ordering would not violate that principle then it would be allowed. Assuming the behavior is well defined, if your program contains undefined behavior such as a data race then the behavior of the program will be unpredictable and as commented would require use of some form of synchronization to protect the critical section.
A Useful reference
An interesting article that covers this is Memory Ordering at Compile Time and it says:
The cardinal rule of memory reordering, which is universally followed
by compiler developers and CPU vendors, could be phrased as follows:
Thou shalt not modify the behavior of a single-threaded program.
An Example
The article provides a simple program where we can see this reordering:
int A, B; // Note: static storage duration so initialized to zero
void foo()
{
A = B + 1;
B = 0;
}
and shows at higher optimization levels B = 0
is done before A = B + 1
, and we can reproduce this result using godbolt, which while using -O3
produces the following (see it live):
movl $0, B(%rip) #, B
addl $1, %eax #, D.1624
Why?
Why does the compiler reorder? The article explains it is exactly the same reason the processor does so, because of complexity of the architecture:
As I mentioned at the start, the compiler modifies the order of memory
interactions for the same reason that the processor does it –
performance optimization. Such optimizations are a direct consequence
of modern CPU complexity.
Standards
In the draft C++ standard this is covered in section 1.9
Program execution which says (emphasis mine going forward):
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.5
footnote 5
tells us this is also known as the as-if rule:
This provision is sometimes called the “as-if” rule, because an
implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement
had been obeyed, as far as can be determined from the observable
behavior of the program. For instance, an actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no side effects affecting the observable behavior of
the program are produced.
the draft C99 and draft C11 standard covers this in section 5.1.2.3
Program execution although we have to go to the index to see that it is called the as-if rule in the C standard as well:
as−if rule, 5.1.2.3
Update on Lock-Free considerations
The article An Introduction to Lock-Free Programming covers this topic well and for the OPs concerns on lock-less shared-hash table implementation this section is probably the most relevant:
Memory Ordering
As the flowchart suggests, any time you do lock-free programming for
multicore (or any symmetric multiprocessor), and your environment does
not guarantee sequential consistency, you must consider how to prevent
memory reordering.
On today’s architectures, the tools to enforce correct memory ordering
generally fall into three categories, which prevent both compiler
reordering and processor reordering:
- A lightweight sync or fence instruction, which I’ll talk about in future posts;
- A full memory fence instruction, which I’ve demonstrated previously;
- Memory operations which provide acquire or release semantics.
Acquire semantics prevent memory reordering of operations which follow
it in program order, and release semantics prevent memory reordering
of operations preceding it. These semantics are particularly suitable
in cases when there’s a producer/consumer relationship, where one
thread publishes some information and the other reads it. I’ll also
talk about this more in a future post.
If there is no dependency of instructions, these may be executed out of order also if final outcome is not affected. You can observe this while debugging a code compiled at higher optimization level.
Since A = a; and B = b; are independent in terms of data dependencies, this should not matter. If there was an output/outcome of previous instruction affecting the subsequent instruction's input, then ordering matters, otherwise not. this is strictly sequential execution normally.
My read is that this is required to work by the C++ standard; however if you're trying to use this for multithreading control, it doesn't work in that context because there is nothing here to guarantee the registers get written to memory in the right order.
As your edit indicates, you are trying to use it exactly where it will not work.
It may be of interest that if you do this:
{ A=a, B=b; /*etc*/ }
Note the comma in place of the semi-colon.
Then the C++ specification and any confirming compiler will have to guarantee the execution order because operands of the comma operator are always evaluated left to right.
This can indeed be used to prevent the optimizer from subverting your thread synchronization by reordering. The comma effectively becomes a barrier across which reordering is not allowed.