Consider the C++ 11 snippet below. For GCC and clang this compiles to two (sequentially consistent) loads of foo. (Editor's note: compilers do not optimize atomics, see this Q&A for more details, especially http://wg21.link/n4455 standards discussion about the problems this could create which the standard doesn't give programmers tools to work around. This language-lawyer Q&A is about the current standard, not what compilers do.)
Does the C++ memory model allow the compiler to merge these two loads into a single load and use the same value for x and y?
(Editor's note: this is something the standards group is working on: http://wg21.link/n4455 and http://wg21.link/p0062. The current standard on paper allows behaviours that are undesirable.)
I think it cannot merge these loads, because that means that polling an atomic doesn't work anymore, but I cannot find the relevant part in the memory model documentation.
#include <atomic>
#include <cstdio>
std::atomic<int> foo;
int main(int argc, char **argv)
{
int x = foo;
int y = foo;
printf("%d %d\n", x, y);
return 0;
}
Yes, because we can not observe the difference!
An implementation is allowed to turn your snippet into the following (pseudo-implementation).
The reason is that there is no way for you to observe the difference between the above, and two separate loads of foo given the guarantees of sequential-consistency.
Explanation
Given a thread that keeps on updating foo in an incremental fashion, what you are guaranteed is that
y
will have either the same, or a later written value, when compared to the contents ofx
.Imagine the writing thread for some reason pauses its execution on every iteration (because of a context-switch or other implementation defined reason); there is no way in which you can prove that this is what is causing both
x
andy
to have the same value, or if it is because of a "merge optimization".In other words, we have to potential outcomes given the code in this section:
x == y
).x < y
).Since any of the two can happen, an implementation is free to narrow down the scope to simply always execute one of them; we can in no way observe the difference.
What does the Standard say?
An implementation can make whatever changes it wants as long as we cannot observe any difference between the behavior which we expressed, and the behavior during execution.
This is covered in
[intro.execution]p1
:Another section which makes it even more clear
[intro.execution]p5
:Further Reading:
What about polling in a loop?
The answer; No.
In a sequentially-consistent environment we are guaranteed that a write to foo in thread 2 will become visible in thread 1; this means that when that write has happened, thread 1 must observe this change of state.
Note: An implementation can turn two reads into a single one because we cannot observe the difference (one fence is just as effective as two), but it cannot completely disregard a read that exists by itself.
Note: The contents of this section is guaranteed by
[atomics.order]p3-4
.What if I really want to prevent this form of "optimization"?
If you would like to force the implementation to actually read the value of some variable at every point where you have written it you should look into usage of
volatile
(note that this in no way enhances thread-safety).But in practice compilers don't optimize atomics, and the standards group has recommended against using
volatile atomic
for this kind of reason until the dust settles on this issue. SeeYes, in your particular example (no otherwise).
Your particular example has a single thread of execution,
foo
has static storage duration and initialization (that is, beforemain
is entered) and it is otherwise never modified during the program's lifetime.In other words, there is no externally observable difference, and the as-if rule can legally be applied. In fact, the compiler could do away with atomic instructions, alltogether. There is no way the value of
x
ory
could be anything different, ever.In a program with concurrency which modifies
foo
, this is not the case.You do not specify a memory model, so the default model is used, which is sequential consistency. Sequencial consistency is defined as giving the same happens-before / memory ordering guarantees as release/acquire and establish a single total modification order of all atomic operations. That last bit is the important part.
Single total modification order means that if you have three (atomic) operations, e.g. A, B, and C which happen in that order (maybe concurrently, in two threads), and B is a write operation while A and C are read operations, then C must see the state established by B, not some other earlier state. That is, the value seen at points A and C will be different.
In terms of your code sample, if another thread modifies
foo
immediately after you are reading it intox
(but before you are reading the value intoy
), the value that is placed intoy
must be the value that was written. Because if the operations happen in that order, they must also be realized in that order.Of course, a write that happens exactly in between two consecutive load instructions is a rather unlikely thing (since the time window is very small, a mere single tick), but it does not matter whether it is unlikely.
The compiler must produce code that makes sure that if this constellation arises, operations are still seen in exactly the order in which they happened.