Just playing around with concurrency in my spare time, and wanted to try preventing torn reads without using locks on the reader side so concurrent readers don't interfere with each other.
The idea is to serialize writes via a lock, but use only a memory barrier on the read side. Here's a reusable abstraction that encapsulate the approach I came up with:
public struct Sync<T>
where T : struct
{
object write;
T value;
int version; // incremented with each write
public static Sync<T> Create()
{
return new Sync<T> { write = new object() };
}
public T Read()
{
// if version after read == version before read, no concurrent write
T x;
int old;
do
{
// loop until version number is even = no write in progress
do
{
old = version;
if (0 == (old & 0x01)) break;
Thread.MemoryBarrier();
} while (true);
x = value;
// barrier ensures read of 'version' avoids cached value
Thread.MemoryBarrier();
} while (version != old);
return x;
}
public void Write(T value)
{
// locks are full barriers
lock (write)
{
++version; // ++version odd: write in progress
this.value = value;
// ensure writes complete before last increment
Thread.MemoryBarrier();
++version; // ++version even: write complete
}
}
}
Don't worry about overflow on the version variable, I avoid that another way. So is my understanding and application of Thread.MemoryBarrier correct in the above? Are any of the barriers unnecessary?
It seems like you're interesting in lock-free /wait-free implementation. Let start from this discussion, for example: Lock-free multi-threading is for real threading experts
I took a long hard look at your code and it does appear correct to me. One thing that immediately jumped out at me was that you used an established pattern for performing the low-lock operation. I can see that you are using
version
as a kind of virtual lock. Even numbers are released and odd numbers are acquired. And since you are using a monotonically increasing value for the virtual lock you are also avoiding the ABA problem. The most important thing, however, is that you continue to loop while attempting to read until the virtual lock value is observed to be the same before the read starts as compared to after it completes. Otherwise, you consider this a failed read and try it all over again. So yeah, job well done on the core logic.So what about the placement of the memory barrier generators? Well, this all looks pretty good as well. All of the
Thread.MemoryBarrier
calls are required. If I had to nit-pick I would say you need one additional one in theWrite
method so that it looks like this.The added call here ensures that
++version
andthis.value = value
do not get swapped. Now, the ECMA specification technically allows that kind of instruction reordering. However, Microsoft's implementation of the CLI and the x86 hardware both already have volatile semantics on writes so it would not really be needed in most cases. But, who knows, maybe it would be necessary on the Mono runtime targeting the ARM cpu.On the
Read
side of things I can find no faults. In fact, the placement of the calls you have is exactly where I would have put them. Some people may wonder why you do not need one before the initial read ofversion
. The reason is because the outer loop will catch the case when the first read was cached because of theThread.MemoryBarrier
further down.So this brings me to a discussion about performance. Is this really faster than taking a hard lock in the
Read
method? Well, I did some pretty extensive testing of your code to help answer that. The answer is a definitive yes! This is quite a bit faster than taking a hard lock. I tested using aGuid
as the value type because it is 128 bits and so that is bigger than the native word size of my machine (64 bits). I also used several different variations on the number of writers and readers. Your low lock technique consistently and significantly outperformed the hard lock technique. I even tried a few variations usingInterlocked.CompareExchange
to do the guarded read and they were all slower as well. In fact, in some situations it was actually slower than taking the hard lock. I have to be honest. I was not at all surprised by this.I also did some pretty significant validity testing. I created tests that would run for quite some time and not once did I see a torn read. And then as a control test I would tweak the
Read
method in such a manner that I knew it would be incorrect and I ran the test again. This time, as expected, torn reads started to appear randomly. I switched the code back to what you have and the torn reads disappeared; again, as expected. This seemed to confirm what I already expected. That is, your code looks correct. I do not have a wide variety of runtime and hardware environments to test with (nor do I have the time) so I am not willing to give it a 100% seal of approval, but I do think I can give your implementation two thumbs up for now.Finally, with all that said, I would still avoid putting this in production. Yeah, it may be correct, but the next guy who has to maintain the code is probably not going to understand it. Someone may change the code and break it because they do not understand the consequences of their changes. You have to admit that this code is pretty brittle. Even the slightest change could break it.