I understand that a BlockingCollection
is best suited for a consumer/producer pattern. However, when do I use a ActionBlock
from the TPL DataFlow library?
My initial understanding is for IO operations, keep the BlockingCollection
while CPU intensive operations are bested suited for an ActionBlock
. But I feel like this isn't the whole story... Any additional insight?
TPL Dataflow is better suited for an actor based design. That means that if you want to chain producers and consumers it's much easier with TDF.
Another big plus for TPL dataflow is that it was built with async
in mind. You can both produce and consume in a synchronous way and in an async
way (and both at the same time), which is very useful.
(I mostly produce in a synchronous way and consume in a non-blocking async
way).
You can also very easily set a bounded capacity and degree of parallelism.
TL;DR: BlockingCollection
is a simple and general tool. TPL Dataflow
is much more robust, but can be an overkill or a bad fit for specific problems.
Not sure if the repeated use of the word Block is causing confusion here. They are very different things.
You're right, a BlockingCollection is well suited to a producer consumer situation, in that it will block an attempt to read from it until data is available. However, BlockingCollection is not a part of TPL Dataflow. It was introduced in .NET 4.0 as one of the new thread safe collection types.
An ActionBlock, however, is a type of 'Block' defined by TPL Dataflow, and can be used to perform an action. Block, in this sense, more refers to it's use as a part of a data flow.
Data flows, as defined in TPL data flow are made up of blocks, and there are three main types. From the documentation:
The TPL Dataflow Library consists of dataflow blocks, which are data structures that buffer and process data. The TPL defines three kinds of dataflow blocks: source blocks, target blocks, and propagator blocks. A source block acts as a source of data and can be read from. A target block acts as a receiver of data and can be written to. A propagator block acts as both a source block and a target block, and can be read from and written to. The TPL defines the System.Threading.Tasks.Dataflow.ISourceBlock interface to represent sources, System.Threading.Tasks.Dataflow.ITargetBlock to represent targets, and System.Threading.Tasks.Dataflow.IPropagatorBlock to represent propagators. IPropagatorBlock inherits from both ISourceBlock, and TargetBlock.
The TPL Dataflow Library provides several predefined dataflow block types that implement the ISourceBlock, ITargetBlock, and IPropagatorBlock interfaces. These dataflow block types are described in this document in the section Predefined Dataflow Block Types.
An ActionBlock is a type of ITargetBlock, which takes an input, performs an action, and then stops.
To answer your first question, I would think that you may use a BlockingCollection when your process is simple. You would use TPL Dataflow when your process is more complicated, and in that case, you probably wouldn't need a BlockingCollection.
There are examples of the Producer-Consumer problem using BlockingCollection here:
http://blogs.msdn.com/b/csharpfaq/archive/2010/08/12/blocking-collection-and-the-producer-consumer-problem.aspx?Redirected=true
and here:
http://programmerfindings.blogspot.co.uk/2012/07/producer-consumer-problem-using-tpl-and.html
Neither of these use Dataflow. There is an example of one using Dataflow here:
http://msdn.microsoft.com/en-us/library/hh228601(v=vs.110).aspx
Plus, I would strongly suggest reading the TPL Dataflow documentation here:
http://msdn.microsoft.com/en-us/library/hh228601(v=vs.110).aspx
if you are implementing anything complex.