How to show that the double-checked-lock pattern w

2019-01-17 12:14发布

Recently I've seen some C# projects that use a double-checked-lock pattern on a Dictionary. Something like this:

private static readonly object _lock = new object();
private static volatile IDictionary<string, object> _cache = 
    new Dictionary<string, object>();

public static object Create(string key)
{
    object val;
    if (!_cache.TryGetValue(key, out val))
    {
        lock (_lock)
        {
            if (!_cache.TryGetValue(key, out val))
            {
                val = new object(); // factory construction based on key here.
                _cache.Add(key, val);
            }
        }
    }
    return val;
}

This code is incorrect, since the Dictionary can be "growing" the collection in _cache.Add() while _cache.TryGetValue (outside the lock) is iterating over the collection. It might be extremely unlikely in many situations, but is still wrong.

Is there a simple program to demonstrate that this code fails?

Does it make sense to incorporate this into a unit test? And if so, how?

5条回答
相关推荐>>
2楼-- · 2019-01-17 12:43

In this example, exception #1 is thrown almost instantly on my machine:

var dict = new Dictionary<int, string>() { { 1234, "OK" } };

new Thread(() =>
{
    for (; ; )
    {
        string s;
        if (!dict.TryGetValue(1234, out s))
        {
            throw new Exception();  // #1
        }
        else if (s != "OK")
        {
            throw new Exception();  // #2
        }
    }
}).Start();

Thread.Sleep(1000);
Random r = new Random();
for (; ; )
{
    int k;
    do { k = r.Next(); } while (k == 1234);
    Debug.Assert(k != 1234);
    dict[k] = "FAIL";
}

However, the exact behaviour of code that is not designed to be thread-safe is unpredictable.
You cannot rely on it. So the double-checking code is indeed broken.

I'm not sure if I'd unit test this, though, as testing concurrent code (and getting it right) is much more complicated than writing the concurrent code in the first place.

查看更多
Lonely孤独者°
3楼-- · 2019-01-17 12:44

Clearly the code is not threadsafe. What we have here is a clear case of the hazards of premature optimization.

Remember, the purpose of the double-checked locking pattern is to improve the performance of code by eliminating the cost of the lock. If the lock is uncontested it is incredibly cheap already. Therefore, the double-checked locking pattern is justified only in the cases (1) where the lock is going to be heavily contested, or (2) where the code is so incredibly performance-sensitive that the cost of an unconstested lock is still too high.

Clearly we are not in the second case. You're using a dictionary for heaven's sake. Even without the lock it will be doing lookups and comparisons that will be hundreds or thousands of times more expensive than the savings of avoiding an uncontested lock.

If we are in the first case then figure out what is causing the contention and eliminate that. If you're doing a lot of waiting around on a lock then figure out why that is and replace the locking with a slim reader-writer-lock or restructure the application so that not so many threads are banging on the same lock at the same time.

In either case there is no justification for doing dangerous, implementation-sensitive low-lock techniques. You should only be using low-lock techniques in those incredibly rare cases where you really, truly cannot take the cost of an uncontested lock.

查看更多
Anthone
4楼-- · 2019-01-17 12:58

The reason I guess this question comes up again and again:

Pre-2.0, Before Generics (B.G.), Hashtable was the primary associative container in .NET, which indeed provides some threading guarantees. From MSDN:
"Hashtable is thread safe for use by multiple reader threads and a single writing thread. It is thread safe for multi-thread use when only one of the threads perform write (update) operations, which allows for lock-free reads provided that the writers are serialized to the Hashtable."

Before anyone gets extremely excited, there are some limitations.
See e.g. this post from Brad Abrams, who owns Hashtable.
Some more historical background on Hashtable can be found here (...near the end: "After this lengthy diversion - What about Hashtable?").

Why Dictionary<TKey, TValue> fails in the above case:

To prove that it fails, it is enough to find one example, so I'll try just that.
A resize happens as the table grows.
On resize, a rehash happens and one sees this as the last two lines:

this.buckets = newBuckets;
//One of the problems here.
this.entries = newEntries;

The buckets array holds indexes into the entries array. Let's say we have 10 entries so far and right now we are adding a new.
Let's further pretend for the sake of simplicity that we did not and will not get collisions.
In the old buckets, we had indexes running from 0 to 9 - if we had no collisions.
Now the indexes in the new buckets array run from 0 to 10(!).
We now change the private buckets field to point to the new buckets.
If there is a reader doing TryGetValue() at this moment, it uses the new buckets to get the index, but then uses the new index to read into the old entries array, since the entries field still points to the old entries.
One of the things one can get - besides false reads - is a friendly IndexOutOfRangeException.
Another "great" way to get this is in @Aaronaught's explanation. (...and both can happen, e.g. as in dtb's example).

This is really just one example, Dictonary was not designed and never meant to be thread-safe. It was designed to be fast, however - that means that the lock will not be held for long.

查看更多
乱世女痞
5楼-- · 2019-01-17 13:05

I don't really think that you need to prove this, you just need to refer people to the documentation for Dictionary<TKey, TValue>:

A Dictionary can support multiple readers concurrently, as long as the collection is not modified. Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses, the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.

It's actually a well-known fact (or should be) that you cannot read from a dictionary while another thread is writing to it. I've seen a few "bizarre multi-threading issue" kinds of questions here on SO where it turned out that the author didn't realize that this wasn't safe.

The problem isn't specifically related to double-checked locking, it's just that the dictionary is not a thread-safe class, not even for a single-writer/single-reader scenario.


I'll go one step further and show you why, in Reflector, this isn't thread-safe:

private int FindEntry(TKey key)
{
    // Snip a bunch of code
    for (int i = this.buckets[num % this.buckets.Length]; i >= 0;
        i = this.entries[i].next)
    // Snip a bunch more code
}

private void Resize()
{
    int prime = HashHelpers.GetPrime(this.count * 2);
    int[] numArray = new int[prime];
    // Snip a whole lot of code
    this.buckets = numArray;
}

Look at what can happen if the Resize method happens to be running while even one reader calls FindEntry:

  1. Thread A: Adds an element, resulting in a dynamic resize;
  2. Thread B: Calculates the bucket offset as (hash code % bucket count);
  3. Thread A: Changes the buckets to have a different (prime) size;
  4. Thread B: Chooses an element index from the new bucket array at the old bucket index;
  5. Thread B's pointer is no longer valid.

And this is exactly what fails in dtb's example. Thread A searches for a key that is known in advance to be in the dictionary, and yet it isn't found. Why? Because the FindValue method picked what it thought was the correct bucket, but before it even had a chance to look inside, Thread B changed the buckets, and now Thread A is looking in some totally random bucket that does not contain or even lead to the right entry.

Moral of the story: TryGetValue is not an atomic operation, and Dictionary<TKey, TValue> is not a thread-safe class. It's not just concurrent writes you need to worry about; you can't have concurrent read-writes either.

In reality the problem actually runs a lot deeper than this, due to instruction reordering by the jitter and CPU, stale caches, etc. - there are no memory barriers whatsoever being used here - but this should prove beyond a doubt that there's an obvious race condition if you have an Add invocation running at the same time as a TryGetValue invocation.

查看更多
霸刀☆藐视天下
6楼-- · 2019-01-17 13:06

Including the code in the question, you can test it with the following code.

//using System.Collections.Generic;
//using System.Threading;

private static volatile int numRunning = 2;
private static volatile int spinLock = 0;

static void Main(string[] args)
{
    new Thread(TryWrite).Start();
    new Thread(TryWrite).Start();
}

static void TryWrite()
{
    while(true) 
    {
        for (int i = 0; i < 1000000; i++ )
        {
            Create(i.ToString());
        }

        Interlocked.Decrement(ref numRunning);
        while (numRunning > 0) { } // make sure every thread has passed the previous line before proceeding (call this barrier 1)

        while (Interlocked.CompareExchange(ref spinLock, 1, 0) != 0){Thread.Sleep(0);} // Aquire lock (spin lock)
        // only one thread can be here at a time...

        if (numRunning == 0) // only the first thread to get here executes this...
        {
            numRunning = 2; // resets barrier 1
            // since the other thread is beyond the barrier, but is waiting on the spin lock,
            //  nobody is accessing the cache, so we can clear it...
            _cache = new Dictionary<string, object>(); // clear the cache... 
        }

        spinLock = 0; // release lock...
    }

}

This program just tries to get Create to traverse the collection as it is being "grown". It should be run on a machine with at least two cores (or two processors), and will most likely fail after a while with this exception.

System.Collections.Generic.Dictionary`2.FindEntry(TKey key)

Adding this test is difficult since it is a probabilistic test, and you don't know how long it will take to fail (if ever). I guess you could pick a value like 10 seconds and let it run for that long. If it doesn't fail in that amount of time, then the test passes. Not the best, but something. You should also verify that Environment.ProcessorCount > 1 before running the test, otherwise the likelihood of it failing is minuscule.

查看更多
登录 后发表回答