Can atomic loads be merged in the C++ memory model

2020-08-09 07:01发布

问题:

Consider the C++ 11 snippet below. For GCC and clang this compiles to two (sequentially consistent) loads of foo. (Editor's note: compilers do not optimize atomics, see this Q&A for more details, especially http://wg21.link/n4455 standards discussion about the problems this could create which the standard doesn't give programmers tools to work around. This language-lawyer Q&A is about the current standard, not what compilers do.)

Does the C++ memory model allow the compiler to merge these two loads into a single load and use the same value for x and y?

(Editor's note: this is something the standards group is working on: http://wg21.link/n4455 and http://wg21.link/p0062. The current standard on paper allows behaviours that are undesirable.)


I think it cannot merge these loads, because that means that polling an atomic doesn't work anymore, but I cannot find the relevant part in the memory model documentation.

#include <atomic>
#include <cstdio>

std::atomic<int> foo;

int main(int argc, char **argv)
{
    int x = foo;
    int y = foo;

    printf("%d %d\n", x, y);
    return 0;
}

回答1:

Yes, because we can not observe the difference!

An implementation is allowed to turn your snippet into the following (pseudo-implementation).

int __loaded_foo = foo;

int x = __loaded_foo;
int y = __loaded_foo;

The reason is that there is no way for you to observe the difference between the above, and two separate loads of foo given the guarantees of sequential-consistency.

Note: It is not just the compiler that can make such an optimization, the processor can simply reason that there is no way in which you can observe the difference and load the value of foo once — even though the compiler might have asked it to do it twice.





Explanation

Given a thread that keeps on updating foo in an incremental fashion, what you are guaranteed is that y will have either the same, or a later written value, when compared to the contents of x.

// thread 1 - The Writer
while (true) {
  foo += 1;
}
// thread 2 - The Reader
while (true) {
  int x = foo;
  int y = foo;

  assert (y >= x); // will never fire, unless UB (foo has reached max value)
}                  

Imagine the writing thread for some reason pauses its execution on every iteration (because of a context-switch or other implementation defined reason); there is no way in which you can prove that this is what is causing both x and y to have the same value, or if it is because of a "merge optimization".


In other words, we have to potential outcomes given the code in this section:

  1. No new value is written to foo between the two reads (x == y).
  2. A new value is written to foo between the two reads (x < y).

Since any of the two can happen, an implementation is free to narrow down the scope to simply always execute one of them; we can in no way observe the difference.





What does the Standard say?

An implementation can make whatever changes it wants as long as we cannot observe any difference between the behavior which we expressed, and the behavior during execution.

This is covered in [intro.execution]p1:

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

Another section which makes it even more clear [intro.execution]p5:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.

Further Reading:

  • What exactly is the "as-if rule"?





What about polling in a loop?

// initial state
std::atomic<int> foo = 0;
// thread 1
while (true) {
  if (foo)
    break;
}
// thread 2
foo = 1

Question: Given the reasoning in the previous sections, could an implementation simply read foo once in thread 1, and then never break out of the loop even if thread 2 writes to foo?

The answer; No.

In a sequentially-consistent environment we are guaranteed that a write to foo in thread 2 will become visible in thread 1; this means that when that write has happened, thread 1 must observe this change of state.

Note: An implementation can turn two reads into a single one because we cannot observe the difference (one fence is just as effective as two), but it cannot completely disregard a read that exists by itself.

Note: The contents of this section is guaranteed by [atomics.order]p3-4.





What if I really want to prevent this form of "optimization"?

If you would like to force the implementation to actually read the value of some variable at every point where you have written it you should look into usage of volatile (note that this in no way enhances thread-safety).

But in practice compilers don't optimize atomics, and the standards group has recommended against using volatile atomic for this kind of reason until the dust settles on this issue. See

  • http://wg21.link/n4455
  • http://wg21.link/p0062
  • Why don't compilers merge redundant std::atomic writes?
  • and a duplicate of this question, Can and does the compiler optimize out two atomic loads?


回答2:

Yes, in your particular example (no otherwise).

Your particular example has a single thread of execution, foo has static storage duration and initialization (that is, before main is entered) and it is otherwise never modified during the program's lifetime.
In other words, there is no externally observable difference, and the as-if rule can legally be applied. In fact, the compiler could do away with atomic instructions, alltogether. There is no way the value of x or y could be anything different, ever.

In a program with concurrency which modifies foo, this is not the case.

You do not specify a memory model, so the default model is used, which is sequential consistency. Sequencial consistency is defined as giving the same happens-before / memory ordering guarantees as release/acquire and establish a single total modification order of all atomic operations. That last bit is the important part.

Single total modification order means that if you have three (atomic) operations, e.g. A, B, and C which happen in that order (maybe concurrently, in two threads), and B is a write operation while A and C are read operations, then C must see the state established by B, not some other earlier state. That is, the value seen at points A and C will be different.

In terms of your code sample, if another thread modifies foo immediately after you are reading it into x (but before you are reading the value into y), the value that is placed into y must be the value that was written. Because if the operations happen in that order, they must also be realized in that order.

Of course, a write that happens exactly in between two consecutive load instructions is a rather unlikely thing (since the time window is very small, a mere single tick), but it does not matter whether it is unlikely.
The compiler must produce code that makes sure that if this constellation arises, operations are still seen in exactly the order in which they happened.