Using ThreadPool within loops in C#

2020-07-18 02:43发布

问题:

I'm not too clued up with threading but is the following code acceptable (I'm more worried about using thread pools within a loop):

      string[] filePaths = GetFilePaths();

      foreach (string filePath in filePaths )
      {
        ThreadPool.QueueUserWorkItem(DoStuff, filePath); 
      }

Is there any other way that this can be done?

EDIT:

N.B. Each execution of DoStuff will in turn create multiple sub threads (about 200). These sub-threads simulate users of the system and is only responsible for receiving and sending information via TCP.

回答1:

Your current code could go wrong when a combination of the 2 following holds:

  • there are very many files
  • processing of a file (DoStuff) takes a significant amount of time

The Threadpool has insufficient load-balancing capabilities and would keep creating more and more threads, way beyond the optimal number.

If you can use Fx4, use the TPL.

For earlier versions, rewrite your code to use fewer threads.


Edit, since you are using Fx4 :

Your biggest gain could be from using System.Directory.EnumFiles() to replace Directory.GetFiles().

Sketch:

var files = System.Directory.EnumerateFiles(...);  // deferred execution

Parallel.ForEach(files, f => DoStuff(f));  // maybe use MaxDegree or CancelationToken

// all files done here

You can also wrap this .ForEach in a (single) try/catch etc.

And if DoStuff() needs parallelism you should use the TPL as well, maybe passing along a CancellationToken etc. It would put all the Parallelism under the control of a single scheduler.

You might have to assist in fine-tuning but that too will be a lot easier than without the TPL.



回答2:

It depends on what you are trying to accomplish: if you want the operations to be performed on another thread but don't mind them being done in strict sequence, you can simply do

string[] filePaths = GetFilePaths();
ThreadPool.QueueUserWorkItem(DoStuff, filePaths);

and put the foreach inside DoStuff. This might be an acceptable solution depending on the values you expect filePaths to have (e.g. if all paths are on the same device, trying to do it all at once will not be faster; it might even be slower), and it definitely is the simplest.

If you definitely want to do them in parallel, then you should look into the Task Parallel Library (.NET 4 only) and specifically Parallel.ForEach. Since limiting the number of maximum concurrent tasks is a good idea, here's an example that shows how you might do that:

var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
Parallel.ForEach(filePaths, options, i=> {
    DoStuff(i);
});


回答3:

You are correct to be worried. This current situation, depending on how many items are in the loop, could put too much pressure on the Thread Pool. I believe I'm incorrect here, in this situation the items will be queued, but may result in most of the Thread Pool being utilised, which isn't necessarily going to result in the best performance.

As of .NET 4, there are some very easy alternatives to using the Thread Pool.

Parallel.ForEach would be relevant for you here. Here is a link to the new parallelism features in .NET 4:

http://galratner.com/blogs/net/archive/2010/04/24/a-quick-lap-around-net-4-0-s-parallel-features.aspx

Update: as of your edit mentioning 200 sub-threads. I would advise that OS threads are not light-weight objects. They have overhead associated with them and can fast cancel out any gains from parallelism. Performing some task in parallel takes some consideration as to how much work occurs, what the goal is (to free the UI, utilise all cores, etc), and whether the eventual work is CPU intensive or IO bound. There are other factors, but I consider these quite important. I would create another SO question describing what you are attempting to address by using this level of parallelism, such that you can get some design advice more specific to your problem.



回答4:

Why don't you encapsulate your calls in an Action<T> or a delegate and put them in a threadsafe queue in your loop. Then you can start a (number of) thread(s) which will work until all actions from the queue have been executed; That way you can control the amount of threads used and you don't have to worry about spawning too many of them.