Assume that we have the following code:
class Program
{
static volatile bool flag1;
static volatile bool flag2;
static volatile int val;
static void Main(string[] args)
{
for (int i = 0; i < 10000 * 10000; i++)
{
if (i % 500000 == 0)
{
Console.WriteLine("{0:#,0}",i);
}
flag1 = false;
flag2 = false;
val = 0;
Parallel.Invoke(A1, A2);
if (val == 0)
throw new Exception(string.Format("{0:#,0}: {1}, {2}", i, flag1, flag2));
}
}
static void A1()
{
flag2 = true;
if (flag1)
val = 1;
}
static void A2()
{
flag1 = true;
if (flag2)
val = 2;
}
}
}
It's fault! The main quastion is Why... I suppose that CPU reorder operations with flag1 = true; and if(flag2) statement, but variables flag1 and flag2 marked as volatile fields...
In the .NET memory model, the runtime (CLI) will ensure that changes to volatile fields are not cached in registers, so a change on any thread is immediately seen on other threads (NB this is not true in other memory models, including Java's).
But this says nothing about the relative ordering of operations across multiple, volatile or not, fields.
To provide a consistent ordering across multiple fields you need to use a lock (or a memory barrier, either explicitly or implicitly with one of the methods that include a memory barrier).
For more details see "Concurrent Programming on Windows", Joe Duffy, AW, 2008
ECMA-335 specification says:
A volatile read has “acquire semantics” meaning that the read is
guaranteed to occur prior to any references to memory that occur
after the read instruction in the CIL instruction sequence. A
volatile write has “release semantics” meaning that the write is
guaranteed to happen after any memory references prior to the write
instruction in the CIL instruction sequence. A conforming
implementation of the CLI shall guarantee this semantics of volatile
operations. This ensures that all threads will observe volatile
writes performed by any other thread in the order they were performed. But a conforming implementation is
not required to provide a single total ordering of volatile writes
as seen from all threads of execution.
Let's draw how it looks:
So, we have two half-fences: one for volatile write and one for volatile read. And they are not protecting us from reordering of instructions between them.
Moreover, even on such strict architecture like AMD64 (x86-64) it is allowed stores to be reordered after loads.
And for other architectures with weaker hardware memory model you can observe even more funny stuff. On ARM you can get partially constructed object observed if reference was assigned in non-volatile way.
To fix your example you should just put Thread.MemoryBarrier()
calls between assignment and if-clause:
static void A1()
{
flag2 = true;
Thread.MemoryBarrier();
if (flag1)
val = 1;
}
static void A2()
{
flag1 = true;
Thread.MemoryBarrier();
if (flag2)
val = 2;
}
This will protect us from reordering of these instructions by adding full-fence.