I have a constant flow of certain items that I need to process in parallel so I'm using TPL Dataflow
. The catch is that the items that share the same key (similar to a Dictionary) should be processed in a FIFO order and not be parallel to each other (they can be parallel to other items with different values).
The work being done is very CPU bound with minimal asynchronous locks so my solution was to create an array of ActionBlock<T>
s the size of Environment.ProcessorCount
with no parallelism and post to them according to the key's GetHashCode
value.
Creation:
_actionBlocks = new ActionBlock<Item>[Environment.ProcessorCount];
for (int i = 0; i < _actionBlocks.Length; i++)
{
_actionBlocks[i] = new ActionBlock<Item>(_ => ProcessItemAsync(_));
}
Usage:
bool ProcessItem(Key key, Item item)
{
var actionBlock = _actionBlocks[(uint)key.GetHashCode() % _actionBlocks.Length];
return actionBlock.Post(item);
}
So, my question is, is this the best solution to my problem? Am I hurting performance/scalability? Am I missing something?
I think your approach is reasonable, assuming you know the hash codes will be distributed well.
If you want to have a better protection against bad distributions, you could use larger number of
ActionBlock
s while limiting their total concurrency level by using a single customTaskScheduler
shared by all blocks. You can find such scheduler in ParallelExtensionsExtras or on MSDN.