I'm trying to implement the Parallel.ForEach
pattern and track progress, but I'm missing something regarding locking. The following example counts to 1000 when the threadCount = 1
, but not when the threadCount
> 1. What is the correct way to do this?
class Program
{
static void Main()
{
var progress = new Progress();
var ids = Enumerable.Range(1, 10000);
var threadCount = 2;
Parallel.ForEach(ids, new ParallelOptions { MaxDegreeOfParallelism = threadCount }, id => { progress.CurrentCount++; });
Console.WriteLine("Threads: {0}, Count: {1}", threadCount, progress.CurrentCount);
Console.ReadKey();
}
}
internal class Progress
{
private Object _lock = new Object();
private int _currentCount;
public int CurrentCount
{
get
{
lock (_lock)
{
return _currentCount;
}
}
set
{
lock (_lock)
{
_currentCount = value;
}
}
}
}
The usual problem with calling something like count++
from multiple threads (which share the count
variable) is that this sequence of events can happen:
- Thread A reads the value of
count
.
- Thread B reads the value of
count
.
- Thread A increments its local copy.
- Thread B increments its local copy.
- Thread A writes the incremented value back to
count
.
- Thread B writes the incremented value back to
count
.
This way, the value written by thread A is overwritten by thread B, so the value is actually incremented only once.
Your code adds locks around operations 1, 2 (get
) and 5, 6 (set
), but that does nothing to prevent the problematic sequence of events.
What you need to do is to lock the whole operation, so that while thread A is incrementing the value, thread B can't access it at all:
lock (progressLock)
{
progress.CurrentCount++;
}
If you know that you will only need incrementing, you could create a method on Progress
that encapsulates this.
Old question, but I think there is a better answer.
You can report progress using Interlocked.Increment(ref progress)
that way you do not have to worry about locking the write operation to progress.
The easiest solution would actually have been to replace the property with a field, and
lock { ++progress.CurrentCount; }
(I personally prefer the look of the preincrement over the postincrement, as the "++." thing clashes in my mind! But the postincrement would of course work the same.)
This would have the additional benefit of decreasing overhead and contention, since updating a field is faster than calling a method that updates a field.
Of course, encapsulating it as a property can have advantages too. IMO, since field and property syntax is identical, the ONLY advantage of using a property over a field, when the property is autoimplemented or equivalent, is when you have a scenario where you may want to deploy one assembly without having to build and deploy dependent assemblies anew. Otherwise, you may as well use faster fields! If the need arises to check a value or add a side effect, you simply convert the field to a property and build again. Therefore, in many practical cases, there is no penalty to using a field.
However, we are living in a time where many development teams operate dogmatically, and use tools like StyleCop to enforce their dogmatism. Such tools, unlike coders, are not smart enough to judge when using a field is acceptable, so invariably the "rule that is simple enough for even StyleCop to check" becomes "encapsulate fields as properties", "don't use public fields" et cetera...
Remove lock statements from properties and modify Main body:
object sync = new object();
Parallel.ForEach(ids, new ParallelOptions {MaxDegreeOfParallelism = threadCount},
id =>
{
lock(sync)
progress.CurrentCount++;
});
The issue here is that ++
is not atomic - one thread can read and increment the value between another thread reading the value and it storing the (now incorrect) incremented value. This is probably compounded by the fact there's a property wrapping your int
.
e.g.
Thread 1 Thread 2
reads 5 .
. reads 5
. writes 6
writes 6! .
The locks around the setter and getter don't help this, as there's nothing to stop the lock
blocks themseves being called out of order.
Ordinarily, I'd suggest using Interlocked.Increment
, but you can't use this with a property.
Instead, you could expose _lock
and have the lock
block be around the progress.CurrentCount++;
call.
It is better to store any database or file system operation in a local buffer variable instead of locking it. locking reduces performance.