I have the following code (copied here from LINQPad). Obviously it looks like I am not understanding how TPL works or the code is garbage, why does the parallel version run slower than its non-parallel counterpart?
for (int i = 0; i < 100; i++)
{
ParallelOptions ops = new ParallelOptions();
ops.MaxDegreeOfParallelism = Environment.ProcessorCount;
var watch = Stopwatch.StartNew();
Parallel.ForEach<int>(Enumerable.Range(1, 10000000), ops, x => { int y = x + 1; });
watch.Stop();
Console.WriteLine("Parallel: {0}", watch.Elapsed.TotalSeconds);
watch = Stopwatch.StartNew();
foreach (var x in Enumerable.Range(1, 10000000))
{
int y = x + 1;
}
watch.Stop();
Console.WriteLine("Non-parallel: {0}\n", watch.Elapsed.TotalSeconds);
}
First 10 results:
Parallel: 0.1991644
Non-parallel: 0.0466178
Parallel: 0.1723428
Non-parallel: 0.0447134
Parallel: 0.1141791
Non-parallel: 0.0444557
Parallel: 0.1758878
Non-parallel: 0.0444636
Parallel: 0.1687637
Non-parallel: 0.0444338
Parallel: 0.1677679
Non-parallel: 0.0445771
Parallel: 0.1191462
Non-parallel: 0.0446116
Parallel: 0.1702483
Non-parallel: 0.0454863
Parallel: 0.1143605
Non-parallel: 0.0451731
Parallel: 0.2155218
Non-parallel: 0.0450392
Well, the best answer you can get is to run a profiler tool and measure what is going on with your code. But my educated guess is that your parallel code is slower because your code is so simple that starting up threads and switching between them add up so much cost that any advantage in the calculation speed is negligible.
But try to make some substantial computations and you eventually will have the parallel execution advantage. Your code is too simple. Modern CPUs are not to be loaded in this way.
Since I cannot add this as a comment, I am adding another answer to post the modified code. What @ixSci said in his answer seems to be correct. I was performing a trivial operation in the parallel code body that was executing really fast but the slowness came from the fact that a lot of time was spent on context switching between threads? When I changed the code to sleep for some time instead of increasing the int value by 1, the parallel code was roughly 4 (no. of cores in my CPU) times faster than the non parallel version.
for (int i = 0; i < 100; i++)
{
ParallelOptions ops = new ParallelOptions();
ops.MaxDegreeOfParallelism = Environment.ProcessorCount;
var partitioner = Partitioner.Create<int>(Enumerable.Range(1, 5000));
var watch = Stopwatch.StartNew();
Parallel.ForEach<int>(partitioner, ops, x => { Thread.Sleep(1); });
watch.Stop();
Console.WriteLine("Parallel: {0}", watch.Elapsed.TotalSeconds);
watch = Stopwatch.StartNew();
foreach (var x in Enumerable.Range(1, 5000))
{
Thread.Sleep(1);
}
watch.Stop();
Console.WriteLine("Non-parallel: {0}\n", watch.Elapsed.TotalSeconds);
}
First 10 results:
Parallel: 1.2887589
Non-parallel: 5.0020569
Parallel: 1.277047
Non-parallel: 5.0011116
Parallel: 1.2790631
Non-parallel: 5.0001498
Parallel: 1.2770644
Non-parallel: 5.0052016
Parallel: 1.2770013
Non-parallel: 5.0021479
Parallel: 1.2770031
Non-parallel: 5.0001927
Parallel: 1.2799937
Non-parallel: 5.0062141
Parallel: 1.2819909
Non-parallel: 5.0171945
Parallel: 1.2780496
Non-parallel: 5.0071667
Parallel: 1.2821714
Non-parallel: 5.0082108
Parallel: 1.2777875
Non-parallel: 5.0152099