Does writing the same value to the same memory loc

2019-04-04 07:48发布

问题:

Consider the following code that writes the same value to the same memory location from multiple threads:

void f(int* buf, int n, int* p) {
    for(int i = 0; i < n; i++)
        buf[i] = i;
    *p = buf[n/2];
}

void g(int* buf, int n) {
    int x1, x2;
    thread t1(f, buf, n, &x1);
    thread t2(f, buf, n, &x2);
    t1.join();
    t2.join();
    assert(x1 == x2);
}

Although it's interesting, I'm of less concern of what guarantees the standard gives, since I guess it gives none. What I do care is what will be the behavior of the above code on real-world multiprocessor hardware. Will the assert always pass or there's any chance of a race-condition, cache synchronization problems, etc..?

回答1:

Memory models with regards to multi-treading concern when the effects of writes made by one thread are observable by another thread. In the code you posted both threads write the same values into the same memory location, so it doesn't matter which thread's write buf[n/2] reads, either will do.

Modern processors employ cache coherency protocols, such as MESI, so when the threads write to the buffer concurrently there is going to be a lot of messages sent between the CPUs to synchronize the cache lines holding the buffer making it run much slower than in non-concurrent scenario (false sharing effect).

Here it doesn't matter if the writes are atomic or not, since both threads write the same values to the same memory locations. There is a race, but it doesn't matter which thread wins because the observed values are going to be the same even with partial writes.



回答2:

There is a race, but in your example both threads will write the same values to the same addresses. Since you are not doing any read-modify-writes, but just writing predetermined numbers, this will be safe in most cases. Writing an int will be an atomic instruction on most systems. The exception would be if you ran this code on a 8-bit microprocessor that uses a sequence of instructions to store an int. In that case it also may still work, but depends on the implementation of the library code that does the multi-byte store.



回答3:

OK, the key point here is indeed, as @Maxim said, cache coherency. In a machine employing cache coherency it's indeed impossible.

However, it can go wrong on a machine with no cache coherency. I don't know a specific architecture, and although they're almost extinct due to natural selection, as far as I know there are some remaining. (If you know an example, please comment.)

Here is a table that represents an execution of two threads filling a zeroed region in memory with ones. For brevity this example is scaled down by a factor of 32, i.e. each digit here represents a 4-byte int in question. Cache line size is 4 ints == 4 digits. The lines marked as "flush" are points where the on-chip cache is flushed to the main memory. In reality it's non-deterministic, as it may happen at any time, e.g. due to a preemptive task switch.

Core 1 cache              Memory                    Core 2 cache
------------------------------------------------------------------------------
                          0000
0000 (load cache)         0000
1000 (set 1st bit)        0000
1100 (set 2nd bit)        0000                      0000 (load cache)
**** (flush)              1100
                          1100                      1000 (set 1st bit)
                          1000                      **** (flush)
                          1000                      1000 (load cache)
                          1000                      1100 (set 2nd bit)
1000 (load cache)         1000                      1110 (set 3rd bit)
1010 (set 3rd bit)        1000                      1111 (set 4th bit)
1011 (set 4th bit)        1111                      **** (flush)
**** (flush)              1011

So we got a wrong result in the end.

I emphasize again that this counter-example is valid only on cache incoherent machines.