Parallel.ForEach Ordered Execution

2019-01-19 11:20发布

问题:

I am trying to execute parallel functions on an list of objects using the new C# 4.0 Parallel.ForEach function. This is a very long maintenance process. I would like to make it execute in the order of the list so that I can stop and continue execution at the previous point. How do I do this?

Here is an example. I have a list of objects: a1 to a100. This is the current order:

a1, a51, a2, a52, a3, a53...

I want this order:

a1, a2, a3, a4...

I am OK with some objects being run out of order, but as long as I can find a point in the list where I can say that all objects before this point were run. I read the parallel programming csharp whitepaper and didn't see anything about it. There isn't a setting for this in the ParallelOptions class.

回答1:

If you use Parallel.Break to terminate the loop then you are guarenteed that all indices below the returned value will have been executed. This is about as close as you can get. The example here uses For but ForEach has similar overloads.

int n = ...
var result = new double[n];

var loopResult = Parallel.For(0, n, (i, loopState) =>
{
   if (/* break condition is true */)
   {
      loopState.Break();
      return;
   }
   result[i] = DoWork(i);
});

if (!loopResult.IsCompleted && 
        loopResult.LowestBreakIteration.HasValue)
{
   Console.WriteLine("Loop encountered a break at {0}", 
                      loopResult.LowestBreakIteration.Value);
}

In a ForEach loop, an iteration index is generated internally for each element in each partition. Execution takes place out of order but after break you know that all the iterations lower than LowestBreakIteration will have been completed.

Taken from "Parallel Programming with Microsoft .NET" http://parallelpatterns.codeplex.com/

Available on MSDN. See http://msdn.microsoft.com/en-us/library/ff963552.aspx. The section "Breaking out of loops early" covers this scenario.

See also: http://msdn.microsoft.com/en-us/library/dd460721.aspx



回答2:

Do something like this:

int current = 0;
object lockCurrent = new object();

Parallel.For(0, list.Count, 
             new ParallelOptions { MaxDegreeOfParallelism = MaxThreads },
             (ii, loopState) => {
                    // So the way Parallel.For works is that it chunks the task list up with each thread getting a chunk to work on...
                    // e.g. [1-1,000], [1,001- 2,000], [2,001-3,000] etc...
                    // We have prioritized our job queue such that more important tasks come first. So we don't want the task list to be
                    // broken up, we want the task list to be run in roughly the same order we started with. So we ignore tha past in 
                    // loop variable and just increment our own counter.
                    int thisCurrent = 0;
                    lock (lockCurrent) {
                        thisCurrent = current;
                        current++;
                    }
                    dothework(list[thisCurrent]);
                 });

You can see how when you break out of the parallel for loop you will know the last list item to be executed, assuming you let all threads finish prior to breaking. I'm not a big fan of PLINQ or LINQ. I honestly don't see how writing LINQ/PLINQ leads to maintainable source code or readability.... Parallel.For is a much better solution.



回答3:

As an alternate suggestion, you could record which object have been run and then filter the list when you resume exection to exclude the objects which have already run.

If this needs to be persistent across application restarts, you can store the ID's of the already executed objects (I assume here the objects have some unique identifier).



回答4:

For anybody looking for a simple solution, I have posted 2 extension methods (one using PLINQ and one using Parallel.ForEach) as part of an answer to the following question:

Ordered PLINQ ForAll



回答5:

For anyone else who comes across this question - if you're looping over an array or list (rather than an IEnumberable ), you can use the overload of Parallel.Foreach that gives the element index to maintain original order too.

string[] MyArray; // array of stuff to do parallel tasks on 
string[] ProcessedArray = new string[MyArray.Length];
Parallel.ForEach(MyArray, (ArrayItem,loopstate,ArrayElementIndex) =>
{
    string ProcessedArrayItem = TaskToDo(ArrayItem);
    ProcessedArray[ArrayElementIndex] = ProcessedArrayItem;
});


回答6:

Not sure if question was altered as my comment seems wrong.
Here improved, basically remind that parallel jobs run in out of your control order.
ea printing 10 numbers might result in 1,4,6,7,2,3,9,0.

If you like to stop your program and continue later.
Problems alike this usually endup in batching workloads.
And have some logging of what was done.
Say if you had to check 10.000 numbers for prime or so.
You could loop in batches of size 100, and have a prime log1, log2, log3
log1= 0..99
log2=100..199
Be sure to set some marker to know if a batch job was finished.

Its a general aprouch since the question isnt that exact either.