可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Assuming that aligned pointer loads and stores are naturally atomic on the target platform, what is the difference between this:
// Case 1: Dumb pointer, manual fence
int* ptr;
// ...
std::atomic_thread_fence(std::memory_order_release);
ptr = new int(-4);
this:
// Case 2: atomic var, automatic fence
std::atomic<int*> ptr;
// ...
ptr.store(new int(-4), std::memory_order_release);
and this:
// Case 3: atomic var, manual fence
std::atomic<int*> ptr;
// ...
std::atomic_thread_fence(std::memory_order_release);
ptr.store(new int(-4), std::memory_order_relaxed);
I was under the impression that they were all equivalent, however Relacy detects a data race in
the first case (only):
struct test_relacy_behaviour : public rl::test_suite<test_relacy_behaviour, 2>
{
rl::var<std::string*> ptr;
rl::var<int> data;
void before()
{
ptr($) = nullptr;
rl::atomic_thread_fence(rl::memory_order_seq_cst);
}
void thread(unsigned int id)
{
if (id == 0) {
std::string* p = new std::string("Hello");
data($) = 42;
rl::atomic_thread_fence(rl::memory_order_release);
ptr($) = p;
}
else {
std::string* p2 = ptr($); // <-- Test fails here after the first thread completely finishes executing (no contention)
rl::atomic_thread_fence(rl::memory_order_acquire);
RL_ASSERT(!p2 || *p2 == "Hello" && data($) == 42);
}
}
void after()
{
delete ptr($);
}
};
I contacted the author of Relacy to find out if this was expected behaviour; he says that there is indeed a data race in my test case.
However, I'm having trouble spotting it; can someone point out to me what the race is?
Most importantly, what are the differences between these three cases?
Update: It's occurred to me that Relacy may simply be complaining about the atomicity (or lack thereof, rather) of the variable being accessed across threads... after all, it doesn't know that I intend only to use this code on platforms where aligned integer/pointer access is naturally atomic.
Another update: Jeff Preshing has written an excellent blog post explaining the difference between explicit fences and the built-in ones ("fences" vs "operations"). Cases 2 and 3 are apparently not equivalent! (In certain subtle circumstances, anyway.)
回答1:
I believe the code has a race. Case 1 and case 2 are not equivalent.
29.8 [atomics.fences]
-2- A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.
In case 1 your release fence does not synchronize with your acquire fence because ptr
is not an atomic object and the store and load on ptr
are not atomic operations.
Case 2 and case 3 are equivalent (actually, not quite, see LWimsey's comments and answer), because ptr
is an atomic object and the store is an atomic operation. (Paragraphs 3 and 4 of [atomic.fences] describe how a fence synchronizes with an atomic operation and vice versa.)
The semantics of fences are defined only with respect to atomic objects and atomic operations. Whether your target platform and your implementation offer stronger guarantees (such as treating any pointer type as an atomic object) is implementation-defined at best.
N.B. for both of case 2 and case 3 the acquire operation on ptr
could happen before the store, and so would read garbage from the uninitialized atomic<int*>
. Simply using acquire and release operations (or fences) doesn't ensure that the store happens before the load, it only ensures that if the load reads the stored value then the code is correctly synchronized.
回答2:
Several pertinent references:
- the C++11 draft standard (PDF, see clauses 1, 29 and 30);
- Hans-J. Boehm's overview of concurrency in C++;
- McKenney, Boehm and Crowl on concurrency in C++;
- GCC's developmental notes on concurrency in C++;
- the Linux kernel's notes on concurrency;
- a related question with answers here on Stackoverflow;
- another related question with answers;
- Cppmem, a sandbox in which to experiment with concurrency;
- Cppmem's help page;
- Spin, a tool for analyzing the logical consistency of concurrent systems;
- an overview of memory barriers from a hardware perspective (PDF).
Some of the above may interest you and other readers.
回答3:
Although various answers cover bits and pieces of what the potential problem is and/or provide useful information, no answer correctly describes the potential issues for all three cases.
In order to synchronize memory operations between threads, release and acquire barriers are used to specify ordering.
In the diagram, memory operations A in thread 1 cannot move down across the (one-way) release barrier (regardless whether that is a release operation on an atomic store,
or a standalone release fence followed by a relaxed atomic store). Hence memory operations A are guaranteed to happen before the atomic store.
Same goes for memory operations B in thread 2 which cannot move up across the acquire barrier; hence the atomic load happens before memory operations B.
The atomic ptr
itself provides inter-thread ordering based on the guarantee that it has a single modification order. As soon as thread 2 sees a value for ptr
,
it is guaranteed that the store (and thus memory operations A) happened before the load. Because the load is guaranteed to happen before memory operations B,
the rules for transitivity say that memory operations A happen before B and synchronization is complete.
With that, let's look at your 3 cases.
Case 1 is broken because ptr
, a non-atomic type, is modified in different threads. That is a classical example of a data race and it causes undefined behavior.
Case 2 is correct. As an argument, the integer allocation with new
is sequenced before the release operation. This is equivalent to:
// Case 2: atomic var, automatic fence
std::atomic<int*> ptr;
// ...
int *tmp = new int(-4);
ptr.store(tmp, std::memory_order_release);
Case 3 is broken, albeit in a subtle way. The problem is that even though the ptr
assignment is correctly sequenced after the standalone fence,
the integer allocation (new
) is also sequenced after the fence, causing a data race on the integer memory location.
the code is equivalent to:
// Case 3: atomic var, manual fence
std::atomic<int*> ptr;
// ...
std::atomic_thread_fence(std::memory_order_release);
int *tmp = new int(-4);
ptr.store(tmp, std::memory_order_relaxed);
If you map that to the diagram above, the new
operator is supposed to be part of memory operations A. Being sequenced below the release fence,
ordering guarantees no longer hold and the integer allocation may actually be reordered with memory operations B in thread 2.
Therefore, a load()
in thread 2 may return garbage or cause other undefined behavior.
回答4:
The memory backing an atomic variable can only ever be used for the contents of the atomic. However, a plain variable, like ptr in case 1, is a different story. Once a compiler has the right to write to it, it can write anything to it, even the value of a temporary value when you run out of registers.
Remember, your example is pathologically clean. Given a slightly more complex example:
std::string* p = new std::string("Hello");
data($) = 42;
rl::atomic_thread_fence(rl::memory_order_release);
std::string* p2 = new std::string("Bye");
ptr($) = p;
it is totally legal for the compiler to choose to reuse your pointer
std::string* p = new std::string("Hello");
data($) = 42;
rl::atomic_thread_fence(rl::memory_order_release);
ptr($) = new std::string("Bye");
std::string* p2 = ptr($);
ptr($) = p;
Why would it do so? I don't know, perhaps some exotic trick to keep a cache line or something. The point is that, since ptr is not atomic in case 1, there is a race case between the write on line 'ptr($) = p' and the read on 'std::string* p2 = ptr($)', yielding undefined behavior. In this simple test case, the compiler may not choose to exercise this right, and it may be safe, but in more complicated cases the compiler has the right to abuse ptr however it pleases, and Relacy catches this.
My favorite article on the topic: http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
回答5:
The race in the first example is between the publication of the pointer, and the stuff that it points to. The reason is, that you have the creation and initialization of the pointer after the fence (= on the same side as the publication of the pointer):
int* ptr; //noop
std::atomic_thread_fence(std::memory_order_release); //fence between noop and interesting stuff
ptr = new int(-4); //object creation, initalization, and publication
If we assume that CPU accesses to properly aligned pointers are atomic, the code can be corrected by writing this:
int* ptr; //noop
int* newPtr = new int(-4); //object creation & initalization
std::atomic_thread_fence(std::memory_order_release); //fence between initialization and publication
ptr = newPtr; //publication
Note that even though this may work fine on many machines, there is absolutely no guarantee within the C++ standard on the atomicity of the last line. So better use atomic<>
variables in the first place.