I have been learning F# recently, being particularly interested in its ease of exploiting data parallelism. The data |> Array.map |> Async.Parallel |> Async.RunSynchronously
idiom seems very easy to understand and straightforward to use and get real value from.
So why is it that async
is not really intended for this? Donald Syme himself says that PLINQ and Futures are probably a better choice. And other answers I've read here agree with that as well as recommending TPL. (PLINQ doesn't seem too much different to the above built-in functions, as long as you're using the F# Powerpack to get the PSeq
functions.)
F# and functional languages make a lot of sense for this, and some applications have achieved great success with async
parallelism.
So why shouldn't I use async
to execute parallel data processes? What am I going to lose by writing parallel async
code instead of using PLINQ or TPL?
If you have a tiny number of completely independent non-
async
tasks and lots of cores then there is nothing wrong with using async to achieve parallelism. However, if your tasks are dependent in any way or you have more tasks than cores or you push the use ofasync
too far into the code then you will be leaving a lot of performance on the table and could do a lot better by choosing a more appropriate foundation for parallel programming.Note that your example can be written even more elegantly using the TPL from F# though:
You lose the ability to write cache oblivious code and, consequently, will suffer from lots of cache misses and, therefore, all cores stalling waiting for shared memory which means poor scalability on a multicore.
The TPL is built upon the idea that child tasks should execute on the same core as their parent with a high probability and, therefore, will benefit from reusing the same data because it will be hot in the local CPU cache. There is no such assurance with async.
I always figured it's what TPL, PLinq etc... give you over and above what Async does. (Cancellation mechanisms is the one that comes to mind.) This question has some better answers.
This article hints at a slight performance advantage to TPL, but probably not enough to be significant.
I wrote an article that re-implements one C# TPL sample using both
Task
andAsync
, which also has some comments on the difference between the two. You can find it here and there is also a more advanced async-based version.Here is a quote from the first article that compares the two options: