-->

stop Parallel.ForEach immediately

2019-07-11 18:12发布

问题:

I have a problem stopping a Parallel for each loop.

I am iterating over a set of about 40.000 DataRows retrieved from a table, and I need to stop my loop immediately when I have 100 items in my resultset. The problem is that when I trigger the Stop method on the ParallelLoopState, the iteration is not stopped immediately, causing inconsistency in my resultset ( either to few or to many items).

Is there no way to make sure, that I kill all threads, as soon as I hit stop?

  List<DataRow> rows = new List<DataRow>(dataTable.Select());
  ConcurrentDictionary<string, object> resultSet = new ConcurrentDictionary<string, object>();

  rows.EachParallel(delegate (DataRow row, ParallelLoopState state)
  {
    if (!state.IsStopped)
    {
      using (SqlConnection sqlConnection = new SqlConnection(Global.ConnStr))
      {
        sqlConnection.Open();

        //{
        // Do some processing.......
        //}       

        var sourceKey = "key retrieved from processing";
        if (!resultSet.ContainsKey(sourceKey))
        {
          object myCustomObj = new object();

          resultSet.AddOrUpdate(
          sourceKey,
          myCustomObj,
          (key, oldValue) => myCustomObj);
        }

        if (resultSet.Values.Count == 100)
          state.Stop();
      }
    }
  });

回答1:

The documentation page of ParallelLoopState.Stop explains that calling Stop() will prevent new iterations from starting. It won't abort any existing iterations.

Stop() also sets the IsStopped property to true. Long running iterations can check the value of IsStopped and exit prematurely if required.

This is called cooperative cancellation which is far better than aborting threads. Aborting a thread is expensive and makes cleanup difficult. Imagine what would happen if a ThreadAbort exception was thrown just when you wanted to commit your work.

Cooperative cancellation on the other hand allows a task to exit gracefully after commiting or aborting transactions as necessary, closing connections, cleaning up other state and files etc.

Furthermore, Parallel uses tasks, not threads, to process chunks of data. One of those threads is the original thread that started the parallel operation. Aborting wouldn't just waste threadpool threads, it would also kill the main thread.

This is not a bug either - Parallel is meant to solve data parallelism problems, not asynchronous execution. In this scenario, one wants the system to use as many tasks as appropriate to process the data and continue once that processing is complete.