PLINQ Performs Worse Than Usual LINQ

2019-01-17 22:06发布

Amazingly, using PLINQ did not yield benefits on a small test case I created; in fact, it was even worse than usual LINQ.

Here's the test code:

    int repeatedCount = 10000000;
    private void button1_Click(object sender, EventArgs e)
    {
        var currTime = DateTime.Now;
        var strList = Enumerable.Repeat(10, repeatedCount);
        var result = strList.AsParallel().Sum();

        var currTime2 = DateTime.Now;
        textBox1.Text = (currTime2.Ticks-currTime.Ticks).ToString();

    }

    private void button2_Click(object sender, EventArgs e)
    {
        var currTime = DateTime.Now;
        var strList = Enumerable.Repeat(10, repeatedCount);
        var result = strList.Sum();

        var currTime2 = DateTime.Now;
        textBox2.Text = (currTime2.Ticks - currTime.Ticks).ToString();
    }

The result?

textbox1: 3437500
textbox2: 781250

So, LINQ is taking less time than PLINQ to complete a similar operation!

What am I doing wrong? Or is there a twist that I don't know about?

Edit: I've updated my code to use stopwatch, and yet, the same behavior persisted. To discount the effect of JIT, I actually tried a few times with clicking both button1 and button2 and in no particular order. Although the time I got might be different, but the qualitative behavior remained: PLINQ was indeed slower in this case.

标签: c# c#-4.0 plinq
9条回答
Animai°情兽
2楼-- · 2019-01-17 22:44

Is it possible you are not taking into account JIT time? You should run your test twice and discard the first set of results.

Also, you shouldn't use DateTime to get performance timing, use the Stopwatch class instead:

var swatch = new Stopwatch();
swatch.StartNew();

var strList = Enumerable.Repeat(10, repeatedCount); 
var result = strList.AsParallel().Sum(); 

swatch.Stop();
textBox1.Text = swatch.Elapsed;

PLINQ does add some overhead to the processing of a sequence. But the magnitute difference in your case seems excessive. PLINQ makes sense when the overhead cost is outweighed by the benefit of running the logic on multiple cores/CPUs. If you don't have multiple core, running processing in parallel offers no real advantage - and PLINQ should detect such a case and perform the processing sequentially.

EDIT: When creating embedded performance tests of this kind, you should make sure that you are not running them under the debugger, or with Intellitrace enabled, as those can significantly skew performance timings.

查看更多
姐就是有狂的资本
3楼-- · 2019-01-17 22:51

That indeed may be the case because you are increasing the number of context switches and you are not performing any action that would benefit of having threads waiting for something like i/o completion. This is going to be even worse if you are running in a single cpu box.

查看更多
走好不送
4楼-- · 2019-01-17 22:51

Please read the Side Effects section of this article.

http://msdn.microsoft.com/en-us/magazine/cc163329.aspx

I think you can run into many conditions where PLINQ has additional data processing patterns you must understand before you opt to think that is will always purely have faster response times.

查看更多
一夜七次
5楼-- · 2019-01-17 22:54

I'd recommend using the Stopwatch class for timing metrics. In your case it's a better measure of the interval.

查看更多
我欲成王,谁敢阻挡
6楼-- · 2019-01-17 22:58

This is a classic mistake -- thinking, "I'll run a simple test to compare the performance of this single-threaded code with this multi-threaded code."

A simple test is the worst kind of test you can run to measure multi-threaded performance.

Typically, parallelizing some operation yields a performance benefit when the steps you're parallelizing require substantial work. When the steps are simple -- as in, quick* -- the overhead of parallelizing your work ends up dwarfing the miniscule performance gain you would have otherwise gotten.


Consider this analogy.

You're constructing a building. If you have one worker, he has to lay bricks one by one until he's made one wall, then do the same for the next wall, and so on until all walls are built and connected. This is a slow and laborious task that could benefit from parallelization.

The right way to do this would be to parallelize the wall building -- hire, say, 3 more workers, and have each worker construct his own wall so that 4 walls can be built simultaneously. The time it takes to find the 3 extra workers and assign them their tasks is insignificant in comparison to the savings you get by getting 4 walls up in the amount of time it would have previously taken to build 1.

The wrong way to do it would be to parallelize the brick laying -- hire about a thousand more workers and have each worker responsible for laying a single brick at a time. You may think, "If one worker can lay 2 bricks per minute, then a thousand workers should be able to lay 2000 bricks per minute, so I'll finish this job in no time!" But the reality is that by parallelizing your workload at such a microscopic level, you're wasting a tremendous amount of energy gathering and coordinating all of your workers, assigning tasks to them ("lay this brick right there"), making sure no one's work is interfering with anyone else's, etc.

So the moral of this analogy is: in general, use parallelization to split up the substantial units of work (like walls), but leave the insubstantial units (like bricks) to be handled in the usual sequential manner.


*For this reason, you can actually make a pretty good approximation of the performance gain of parallelization in a more work-intensive context by taking any fast-executing code and adding Thread.Sleep(100) (or some other random number) to the end of it. Suddenly sequential executions of this code will be slowed down by 100 ms per iteration, while parallel executions will be slowed significantly less.

查看更多
做个烂人
7楼-- · 2019-01-17 22:59

Something more important that I didn't see mentioned is that .AsParallel will have different performance depending on the collection used.

In my tests PLINQ is faster than LINQ when NOT used on IEnumerable (Enumerable.Repeat) :

  29ms  PLINQ  ParralelQuery    
  30ms   LINQ  ParralelQuery    
  30ms  PLINQ  Array
  38ms  PLINQ  List    
 163ms   LINQ  IEnumerable
 211ms   LINQ  Array
 213ms   LINQ  List
 273ms  PLINQ  IEnumerable
4 processors

Code is in VB, but provided to show that using .ToArray made the PLINQ version few times faster

    Dim test = Function(LINQ As Action, PLINQ As Action, type As String)
                   Dim sw1 = Stopwatch.StartNew : LINQ() : Dim ts1 = sw1.ElapsedMilliseconds
                   Dim sw2 = Stopwatch.StartNew : PLINQ() : Dim ts2 = sw2.ElapsedMilliseconds
                   Return {String.Format("{0,4}ms   LINQ  {1}", ts1, type), String.Format("{0,4}ms  PLINQ  {1}", ts2, type)}
               End Function

    Dim results = New List(Of String) From {Environment.ProcessorCount & " processors"}
    Dim count = 12345678, iList = Enumerable.Repeat(1, count)

    With iList : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "IEnumerable")) : End With
    With iList.ToArray : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "Array")) : End With
    With iList.ToList : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "List")) : End With
    With ParallelEnumerable.Repeat(1, count) : results.AddRange(test(Sub() .Sum(), Sub() .AsParallel.Sum(), "ParralelQuery")) : End With

    MessageBox.Show(String.join(Environment.NewLine, From l In results Order By l))

Running the tests in different order will have a bit different results, so having them in one line makes moving them up and down a bit easier for me.

查看更多
登录 后发表回答