Summary: I changed from System.Threading.Tasks.Parallel.ForEach and Concurrent Data structure to a simple plinq (Parallel Linq) query. The speed up was amazing.
So is plinq inherently faster than Parallel.ForEach? Or is it specific to the task.
// Original Code
// concurrent dictionary to store results
var resultDict = new ConcurrentDictionary<string, MyResultType>();
Parallel.ForEach(items, item =>
{
resultDict.TryAdd(item.Name, PerformWork(source));
});
// new code
var results =
items
.AsParallel()
.Select(item => new { item.Name, queryResult = PerformWork(item) })
.ToDictionary(kv => kv.SourceName, kv => kv.queryResult);
Notes: Each task (PerformWork) now runs between 0 and 200 ms. It used to take longer before I optimized it. That's why I was using the Tasks.Parallel library in the fist place. So I went from 2 seconds total time to ~100-200 ms total time, performing roughly the same work, just with different methods. (Wow linq and plinq are awesome!)
Questions:
- Is the speed up due to using plinq vs Parallel.ForEach?
- Is it instead simply the removal of the concurrent data structure (ConcurrentDictionary)? (Because it doesn't need to synchronize threads).
- Based on the answer from this related question
Whereas PLINQ is largely based on a functional style of programming with no side-effects, side-effects are precisely what the TPL is for. If you want to actually do work in parallel as opposed to just searching/selecting things in parallel, you use the TPL.
Can I assume that because my pattern is basically functional (giving inputs produce new outputs without mutation), that plinq is the correct technology to use?
I'm looking for validation that my assumptions are correct, or an indication that I'm missing something.