When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run()
, or Parallel.Invoke()
- should be used for relatively short operations. When working with long running operations, we are supposed to use the TaskCreationOptions.LongRunning
flag in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.
But what exactly is a long running operation? How long is long, in terms of time? Are there other factors besides the expected task duration to be considered when deciding whether or not to use the LongRunning
, like the anticipated CPU architecture (frequency, the number of cores, ...) or the number of tasks that will be attempted to be run at once from the programmer's perspective?
For example, suppose I have 500 tasks to process in a dedicated application, each taking 10-20 seconds to complete. Should I just start all 500 tasks using Task.Run (e.g. in a loop) and then await them all, perhaps as LongRunning
, while leaving the default max level of concurrency? Then again, if I set LongRunning
in such case, wouldn't this create 500 new threads and actually cause a lot of overhead and higher memory usage (due to extra threads being allocated) as compared to omitting LongRunning
? This is assuming that no new tasks will be scheduled for execution while these 500 are being awaited.
I would guess that the decision to set LongRunning
depends on the number of requests made to the thread pool in a given time interval, and that LongRunning
should only be used for tasks that are expected to take significantly longer that the majority of the thread pool-placed tasks - by definition, at most a small percentage of all tasks. In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?
It kind of doesn't matter. The problem isn't really about time, it's about what your code is doing. If you're doing asynchronous I/O, you're only using the thread for the short amount of time between individual requests. If you're doing CPU work... well, you're using the CPU. There's no "thread-pool starvation", because the CPUs are fully utilized.
The real problem is when you're doing blocking work that doesn't use the CPU. In case like that, thread-pool starvation leads to CPU-underutilization - you said "I need the CPU for my work" and then you don't actually use it.
If you're not using blocking APIs, there's no point in using Task.Run
with LongRunning
. If you have to run some legacy blocking code asynchronously, using LongRunning
may be a good idea. Total work time isn't as important as "how often you are doing this". If you spin up one thread based on a user clicking on a GUI, the cost is tiny compared to all the latencies already included in the act of clicking a button in the first place, and you can use LongRunning
just fine to avoid the thread-pool. If you're running a loop that spawns lots of blocking tasks... stop doing that. It's a bad idea :D
For example, imagine there is no asynchronous API alternative File.Exists
. So if you see that this is giving you trouble (e.g. over a faulty network connection), you'd fire it up using Task.Run
- and since you're not doing CPU work, you'd use LongRunning
.
In contrast, if you need to do some image manipulation that's basically 100% CPU work, it doesn't matter how long the operation takes - it's not a LongRunning
thing.
And finally, the most common scenario for using LongRunning
is when your "work" is actually the old-school "loop and periodically check if something should be done, do it and then loop again". Long running, but 99% of the time just blocking on some wait handle or something like that. Again, this is only useful when dealing with code that isn't CPU-bound, but that doesn't have proper asynchronous APIs. You might find something like this if you ever need to write your own SynchronizationContext
, for example.
Now, how do we apply this to your example? Well, we can't, not without more information. If your code is CPU-bound, Parallel.For
and friends are what you want - those ensure you only use enough threads to sature the CPUs, and it's fine to use the thread-pool for that. If it's not CPU bound... you don't really have any option besides using LongRunning
if you want to run the tasks in parallel. Ideally, such work would consist of asynchronous calls you can safely invoke and await Task.WhenAll(...)
from your own thread.
When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run(), or Parallel.Invoke() - should be used for relatively short operations. When working with long running operations, we are supposed to set the TaskCreationOptions.LongRunning to true in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.
The vast majority of the time, you don't need to use LongRunning
at all, because the thread pool will adjust to "losing" a thread to a long-running operation after 2 seconds.
The main problem with LongRunning
is that it forces you to use the very dangerous StartNew
API.
In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?
Yes. You should never set LongRunning
when first writing code. If you are seeing delays due to the thread pool injection rate, then you can carefully add LongRunning
.
You should not use TaskCreationOptions.LongRunning
in your case. I would use Parallel.For.
The LongRunning
option is not to be used if you're going to create a lot of tasks, just like in your case. It is to be used for creating couple of tasks that will be running for a Long Time.
By the way, i never used this option in any similar scenario.
As you point out, TaskCreationOptions.LongRunning
's purpose is
to allow the ThreadPool to continue to process work items even though one task is running for an extended period of time
As for when to use it:
It's not a specific length per se...You'd typically only use LongRunning if you found through performance testing that not using it was causing long delays in the processing of other work.
Source