Adve and Gharachorloo's report, in Figure 4b, provides the following example of a program that exhibits unexpected behavior in the absence of sequential consistency:
My question is whether it is possible, using only C11 fences and memory_order_relaxed
loads and stores, to ensure that register1, if written, will be written with the value 1. The reason this might be hard to guarantee in the abstract is that P1, P2, and P3 could be at different points in a pathological NUMA network with the property that P2 sees P1's write before P3 does, yet somehow P3 sees P2's write very quickly. The reason this might be hard to guarantee with respect to the C11 spec specifically is that P1's write to A and P2's read of A do not synchronize with each other, and therefore by paragraph 5.1.2.4.26 of the spec will result in undefined behavior. Possibly I can sidestep the undefined behavior through relaxed atomic fetch/store, but I still don't know how to reason transitively about the order seen by P3.
Below is a MWE attempting to solve the problem with fences, but I'm not sure if it is correct. I'm specifically worried that the release fence is not good enough, because it won't flush p1's store buffer, just p2's. However, it will answer my question if you can argue the assert will never fail just based on the C11 standard (as opposed to some other information one might have about a particular compiler and architecture).
#include <assert.h>
#include <stdatomic.h>
#include <stddef.h>
#include <threads.h>
atomic_int a = ATOMIC_VAR_INIT(0);
atomic_int b = ATOMIC_VAR_INIT(0);
void
p1(void *_ignored)
{
atomic_store_explicit(&a, 1, memory_order_relaxed);
}
void
p2(void *_ignored)
{
if (atomic_load_explicit(&a, memory_order_relaxed)) {
atomic_thread_fence(memory_order_release); // not good enough?
atomic_store_explicit(&b, 1, memory_order_relaxed);
}
}
void
p3(void *_ignored)
{
int register1 = 1;
if (atomic_load_explicit(&b, memory_order_relaxed)) {
atomic_thread_fence(memory_order_acquire);
register1 = atomic_load_explicit(&a, memory_order_relaxed);
}
assert(register1 != 0);
}
int
main()
{
thrd_t t1, t2, t2;
thrd_create(&t1, p1, NULL);
thrd_create(&t2, p2, NULL);
thrd_create(&t3, p3, NULL);
thrd_join(&t1, NULL);
thrd_join(&t2, NULL);
thrd_join(&t3, NULL);
}
You forget
memory_order_acquire
fence inp3
:With this fence, loading
a
inp2
will be in happens-before relation with loadinga
inp3
.C11 standard garantees read-read coherence, which means that the loading in
p3
should observe same-or-subsequent modification, which is observed by the happened-before loading inp2
. Because the loading inp2
observes the store inp1
, and no subsequent modifications ofa
is possible in your scenario, loading inp3
should also observe storing inp1
.So your assertion can never trigger.
References to the corresponded statements in the standard:
So, atomic accesses cannot contain data race by definition.
Next paragraph says, that this is cache coherence garantee. C++11 standard is more specific, and says about read-read cache coherence in similar wording.