Reading this question, I wanted to test if I could demonstrate the non-atomicity of reads and writes on a type for which the atomicity of such operations is not guaranteed.
private static double _d;
[STAThread]
static void Main()
{
new Thread(KeepMutating).Start();
KeepReading();
}
private static void KeepReading()
{
while (true)
{
double dCopy = _d;
// In release: if (...) throw ...
Debug.Assert(dCopy == 0D || dCopy == double.MaxValue); // Never fails
}
}
private static void KeepMutating()
{
Random rand = new Random();
while (true)
{
_d = rand.Next(2) == 0 ? 0D : double.MaxValue;
}
}
To my surprise, the assertion refused to fail even after a full three minutes of execution.
What gives?
- The test is incorrect.
- The specific timing characteristics of the test make it unlikely/impossible that the assertion will fail.
- The probability is so low that I have to run the test for much longer to make it likely that it will trigger.
- The CLR provides stronger guarantees about atomicity than the C# spec.
- My OS/hardware provides stronger guarantees than the CLR.
- Something else?
Of course, I don't intend to rely on any behaviour that is not explicitly guaranteed by the spec, but I would like a deeper understanding of the issue.
FYI, I ran this on both Debug and Release (changing Debug.Assert
to if(..) throw
) profiles in two separate environments:
- Windows 7 64-bit + .NET 3.5 SP1
- Windows XP 32-bit + .NET 2.0
EDIT: To exclude the possibility of John Kugelman's comment "the debugger is not Schrodinger-safe" being the problem, I added the line someList.Add(dCopy);
to the KeepReading
method and verified that this list was not seeing a single stale value from the cache.
EDIT:
Based on Dan Bryant's suggestion: Using long
instead of double
breaks it virtually instantly.
You might try running it through CHESS to see if it can force an interleaving that breaks the test.
If you take a look at the x86 diassembly (visible from the debugger), you might also see if the jitter is generating instructions that preserve atomicity.
EDIT: I went ahead and ran the disassembly (forcing target x86). The relevant lines are:
double dCopy = _d;
00000039 fld qword ptr ds:[00511650h]
0000003f fstp qword ptr [ebp-40h]
_d = rand.Next(2) == 0 ? 0D : double.MaxValue;
00000054 mov ecx,dword ptr [ebp-3Ch]
00000057 mov edx,2
0000005c mov eax,dword ptr [ecx]
0000005e mov eax,dword ptr [eax+28h]
00000061 call dword ptr [eax+1Ch]
00000064 mov dword ptr [ebp-48h],eax
00000067 cmp dword ptr [ebp-48h],0
0000006b je 00000079
0000006d nop
0000006e fld qword ptr ds:[002423D8h]
00000074 fstp qword ptr [ebp-50h]
00000077 jmp 0000007E
00000079 fldz
0000007b fstp qword ptr [ebp-50h]
0000007e fld qword ptr [ebp-50h]
00000081 fstp qword ptr ds:[00159E78h]
It uses a single fstp qword ptr to perform the write operation in both cases. My guess is that the Intel CPU guarantees atomicity of this operation, though I haven't found any documentation to support this. Any x86 gurus who can confirm this?
UPDATE:
This fails as expected if you use Int64, which uses the 32-bit registers on the x86 CPU rather than the special FPU registers. You can see this below:
Int64 dCopy = _d;
00000042 mov eax,dword ptr ds:[001A9E78h]
00000047 mov edx,dword ptr ds:[001A9E7Ch]
0000004d mov dword ptr [ebp-40h],eax
00000050 mov dword ptr [ebp-3Ch],edx
UPDATE:
I was curious if this would fail if I forced non-8byte alignment of the double field in memory, so I put together this code:
[StructLayout(LayoutKind.Explicit)]
private struct Test
{
[FieldOffset(0)]
public double _d1;
[FieldOffset(4)]
public double _d2;
}
private static Test _test;
[STAThread]
static void Main()
{
new Thread(KeepMutating).Start();
KeepReading();
}
private static void KeepReading()
{
while (true)
{
double dummy = _test._d1;
double dCopy = _test._d2;
// In release: if (...) throw ...
Debug.Assert(dCopy == 0D || dCopy == double.MaxValue); // Never fails
}
}
private static void KeepMutating()
{
Random rand = new Random();
while (true)
{
_test._d2 = rand.Next(2) == 0 ? 0D : double.MaxValue;
}
}
It does not fail and the generated x86 instructions are essentially the same as before:
double dummy = _test._d1;
0000003e mov eax,dword ptr ds:[03A75B20h]
00000043 fld qword ptr [eax+4]
00000046 fstp qword ptr [ebp-40h]
double dCopy = _test._d2;
00000049 mov eax,dword ptr ds:[03A75B20h]
0000004e fld qword ptr [eax+8]
00000051 fstp qword ptr [ebp-48h]
I experimented with swapping _d1 and _d2 for usage with dCopy/set and also tried a FieldOffset of 2. All generated the same basic instructions (with different offsets above) and all did not fail after several seconds (likely billions of attempts). I'm cautiously confident, given these results, that at least the Intel x86 CPUs provide atomicity of double load/store operations, regardless of alignment.
The compiler is allowed to optimize away the repeated reads of _d
. As far as it knows just statically analyzing your loop, _d
never changes. This means it can cache the value and never re-read the field.
To prevent this you either need to synchronize access to _d
(i.e. surround it with a lock
statement), or mark _d
as volatile
. Making it volatile tells the compiler that its value could change at any time and so it should never cache the value.
Unfortunately (or fortunately), you cannot mark a double
field as volatile
, precisely because of the point you are trying to test—double
s cannot be accessed atomically! Synchronizing access to _d
is the forces the compiler to re-read the value, but that also breaks the test. Oh well!
You might try getting rid of the 'dCopy = _d' and simply use _d in your assert.
That way two threads are reading/writing to the same variable at the same time.
Your current version makes a copy of _d which creates a new instance, all in the same thread, which is a thread safe operation:
http://msdn.microsoft.com/en-us/library/system.double.aspx
All members of this type are thread safe. Members that appear to modify instance state actually return a new instance initialized with the new value. As with any other type, reading and writing to a shared variable that contains an instance of this type must be protected by a lock to guarantee thread safety.
However if both threads are reading/writing to the same variable instance then:
http://msdn.microsoft.com/en-us/library/system.double.aspx
Assigning an instance of this type is not thread safe on all hardware platforms because the binary representation of that instance might be too large to assign in a single atomic operation.
Thus if both threads are reading/writing to the same variable instance you would need a lock to protect it (or Interlocked.Read/Increment/Exchange., not sure if that works on doubles)
Edit
As pointed out by others, on an Intel CPU reading/writing a double is an atomic operation. However, if the program is compiled for X86 and uses a 64 bit integer data type, then the operation would not be atomic. As demonstrated in the following program. Replace the Int64 with double and it appears to work.
Public Const ThreadCount As Integer = 2
Public thrdsWrite() As Threading.Thread = New Threading.Thread(ThreadCount - 1) {}
Public thrdsRead() As Threading.Thread = New Threading.Thread(ThreadCount - 1) {}
Public d As Int64
<STAThread()> _
Sub Main()
For i As Integer = 0 To thrdsWrite.Length - 1
thrdsWrite(i) = New Threading.Thread(AddressOf Write)
thrdsWrite(i).SetApartmentState(Threading.ApartmentState.STA)
thrdsWrite(i).IsBackground = True
thrdsWrite(i).Start()
thrdsRead(i) = New Threading.Thread(AddressOf Read)
thrdsRead(i).SetApartmentState(Threading.ApartmentState.STA)
thrdsRead(i).IsBackground = True
thrdsRead(i).Start()
Next
Console.ReadKey()
End Sub
Public Sub Write()
Dim rnd As New Random(DateTime.Now.Millisecond)
While True
d = If(rnd.Next(2) = 0, 0, Int64.MaxValue)
End While
End Sub
Public Sub Read()
While True
Dim dc As Int64 = d
If (dc <> 0) And (dc <> Int64.MaxValue) Then
Console.WriteLine(dc)
End If
End While
End Sub
IMO the correct answer is #5.
double is 8 bytes long.
Memory interface is 64 bits = 8 bytes per module per clock (i.e. it becomes 16 bytes for double-channel memory).
There're also CPU caches. On my machine, the cache line is 64 bytes, and on all CPUs it's multiple of 8.
As said by the comments above, even when the CPU is running in 32-bits mode, double variables are loaded and stored with just 1 instruction.
That's why as long as your double variable is aligned (I suspect the common language runtime virtual machine does alignment for you), the double reads and writes are atomic.