How to do proper Parallel.ForEach, locking and pro

2019-01-25 11:01发布

问题:

I'm trying to implement the Parallel.ForEach pattern and track progress, but I'm missing something regarding locking. The following example counts to 1000 when the threadCount = 1, but not when the threadCount > 1. What is the correct way to do this?

class Program
{
   static void Main()
   {
      var progress = new Progress();
      var ids = Enumerable.Range(1, 10000);
      var threadCount = 2;

      Parallel.ForEach(ids, new ParallelOptions { MaxDegreeOfParallelism = threadCount }, id => { progress.CurrentCount++; });

      Console.WriteLine("Threads: {0}, Count: {1}", threadCount, progress.CurrentCount);
      Console.ReadKey();
   }
}

internal class Progress
{
   private Object _lock = new Object();
   private int _currentCount;
   public int CurrentCount
   {
      get
      {
         lock (_lock)
         {
            return _currentCount;
         }
      }
      set
      {
         lock (_lock)
         {
            _currentCount = value;
         }
      }
   }
}

回答1:

The usual problem with calling something like count++ from multiple threads (which share the count variable) is that this sequence of events can happen:

  1. Thread A reads the value of count.
  2. Thread B reads the value of count.
  3. Thread A increments its local copy.
  4. Thread B increments its local copy.
  5. Thread A writes the incremented value back to count.
  6. Thread B writes the incremented value back to count.

This way, the value written by thread A is overwritten by thread B, so the value is actually incremented only once.

Your code adds locks around operations 1, 2 (get) and 5, 6 (set), but that does nothing to prevent the problematic sequence of events.

What you need to do is to lock the whole operation, so that while thread A is incrementing the value, thread B can't access it at all:

lock (progressLock)
{
    progress.CurrentCount++;
}

If you know that you will only need incrementing, you could create a method on Progress that encapsulates this.



回答2:

Old question, but I think there is a better answer.

You can report progress using Interlocked.Increment(ref progress) that way you do not have to worry about locking the write operation to progress.



回答3:

The easiest solution would actually have been to replace the property with a field, and

lock { ++progress.CurrentCount; }

(I personally prefer the look of the preincrement over the postincrement, as the "++." thing clashes in my mind! But the postincrement would of course work the same.)

This would have the additional benefit of decreasing overhead and contention, since updating a field is faster than calling a method that updates a field.

Of course, encapsulating it as a property can have advantages too. IMO, since field and property syntax is identical, the ONLY advantage of using a property over a field, when the property is autoimplemented or equivalent, is when you have a scenario where you may want to deploy one assembly without having to build and deploy dependent assemblies anew. Otherwise, you may as well use faster fields! If the need arises to check a value or add a side effect, you simply convert the field to a property and build again. Therefore, in many practical cases, there is no penalty to using a field.

However, we are living in a time where many development teams operate dogmatically, and use tools like StyleCop to enforce their dogmatism. Such tools, unlike coders, are not smart enough to judge when using a field is acceptable, so invariably the "rule that is simple enough for even StyleCop to check" becomes "encapsulate fields as properties", "don't use public fields" et cetera...



回答4:

Remove lock statements from properties and modify Main body:

 object sync = new object();
        Parallel.ForEach(ids, new ParallelOptions {MaxDegreeOfParallelism = threadCount},
                         id =>
                             {
                                 lock(sync)
                                 progress.CurrentCount++;
                             });


回答5:

The issue here is that ++ is not atomic - one thread can read and increment the value between another thread reading the value and it storing the (now incorrect) incremented value. This is probably compounded by the fact there's a property wrapping your int.

e.g.

Thread 1        Thread 2
reads 5         .
.               reads 5
.               writes 6
writes 6!       .

The locks around the setter and getter don't help this, as there's nothing to stop the lock blocks themseves being called out of order.

Ordinarily, I'd suggest using Interlocked.Increment, but you can't use this with a property.

Instead, you could expose _lock and have the lock block be around the progress.CurrentCount++; call.



回答6:

It is better to store any database or file system operation in a local buffer variable instead of locking it. locking reduces performance.