First off, I am running this on a dual core 2.66Ghz processor machine. I am not sure if I have the .AsParallel() call in the correct spot. I tried it directly on the range variable too and that was still slower. I don't understand why...
Here are my results:
Process non-parallel 1000 took 146 milliseconds
Process parallel 1000 took 156 milliseconds
Process non-parallel 5000 took 5187 milliseconds
Process parallel 5000 took 5300 milliseconds
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace DemoConsoleApp
{
internal class Program
{
private static void Main()
{
ReportOnTimedProcess(
() => GetIntegerCombinations(),
"non-parallel 1000");
ReportOnTimedProcess(
() => GetIntegerCombinations(runAsParallel: true),
"parallel 1000");
ReportOnTimedProcess(
() => GetIntegerCombinations(5000),
"non-parallel 5000");
ReportOnTimedProcess(
() => GetIntegerCombinations(5000, true),
"parallel 5000");
Console.Read();
}
private static List<Tuple<int, int>> GetIntegerCombinations(
int iterationCount = 1000, bool runAsParallel = false)
{
IEnumerable<int> range = Enumerable.Range(1, iterationCount);
IEnumerable<Tuple<int, int>> integerCombinations =
from x in range
from y in range
select new Tuple<int, int>(x, y);
return runAsParallel
? integerCombinations.AsParallel().ToList()
: integerCombinations.ToList();
}
private static void ReportOnTimedProcess(
Action process, string processName)
{
var stopwatch = new Stopwatch();
stopwatch.Start();
process();
stopwatch.Stop();
Console.WriteLine("Process {0} took {1} milliseconds",
processName, stopwatch.ElapsedMilliseconds);
}
}
}
The majority of your execution time here is likely going to be in actually creating the list, via the
ToList()
method. This will have to perform several memory allocations, resizing the list and so on. You're also not gaining much of a benefit from parallelizing here because the final operation has to be synchronized (you're building a single list on the output).Try doing something significantly more complex/expensive in the parallel segment, like prime factorization, and increasing the number of iterations to the hundreds of thousands (5000 is a very small number to use when profiling). You should start to see the difference then.
Also make sure that you're profiling in release mode; all too often I see attempts to profile in debug mode, and the results from that will not be accurate.
It's slightly slower because PLINQ has a certain overhead (threads, scheduling, etc) so you have to pick carefully what you will parallelize. This particular code you're benchmarking isn't really worth parallelizing, you have to parallelize over tasks with significant load, otherwise the overhead will weight more than the benefits of parallelization.