Difference linq and plinq

2楼-- · 2019-01-20 09:50

PLinq is the parallel version of Linq. Some queries can be executed on multiple threads and then PLinq gives a performance increase.

However other queries can´t be executed in parallel or will give wrong results if done so. So when to use PLinq is something you should decide on for each query and make sure the performance actually increases.

MSDN has a lot of documentation on it.

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2019-01-20 09:51

Given that AsParallel transparently parallelizes LINQ queries, the question arises, “Why didn’t Microsoft simply parallelize the standard query operators and make PLINQ the default?”

There are a number of reasons for the opt-in approach. First, for PLINQ to be useful there has to be a reasonable amount of computationally intensive work for it to farm out to worker threads. Most LINQ to Objects queries execute very quickly, and not only would parallelization be unnecessary, but the overhead of partitioning, collating, and coordinating the extra threads may actually slow things down.

Additionally:

The output of a PLINQ query (by default) may differ from a LINQ query with respect to element ordering.

The following query operators prevent a query from being parallelized, unless the source elements are in their original indexing position:

Take, TakeWhile, Skip, and SkipWhile The indexed versions of Select, SelectMany, and ElementAt Most query operators change the indexing position of elements (including those that remove elements, such as Where). This means that if you want to use the preceding operators, they’ll usually need to be at the start of the query.

The following query operators are parallelizable, but use an expensive partitioning strategy that can sometimes be slower than sequential processing:

Join, GroupBy, GroupJoin, Distinct, Union, Intersect, and Except The Aggregate operator’s seeded overloads in their standard incarnations are not parallelizable — PLINQ provides special overloads to deal with this.

When to Use PLINQ It’s tempting to search your existing applications for LINQ queries and experiment with parallelizing them. This is usually unproductive, because most problems for which LINQ is obviously the best solution tend to execute very quickly and so don’t benefit from parallelization. A better approach is to find a CPU-intensive bottleneck and then consider, “Can this be expressed as a LINQ query?” (A welcome side effect of such restructuring is that LINQ typically makes code smaller and more readable.)

PLINQ is well suited to embarrassingly parallel problems. It also works well for structured blocking tasks, such as calling several web services at once (see Calling Blocking or I/O-Intensive Functions).

PLINQ can be a poor choice for imaging, because collating millions of pixels into an output sequence creates a bottleneck. Instead, it’s better to write pixels directly to an array or unmanaged memory block and use the Parallel class or task parallelism to manage the multithreading. (It is possible, however, to defeat result collation using ForAll. Doing so makes sense if the image processing algorithm naturally lends itself to LINQ.)

0人赞添加讨论(0) 举报

Melony?

4楼-- · 2019-01-20 10:03

I also wanted to know when to use PLINQ instead of LINQ so I ran some tests.

Summary: There are two questions to answer when deciding whether to use LINQ or PLINQ to run a query.

How many iterations are involved in running the query (how many objects are in the collection)?
How much work is involved in an iteration?

Use LINQ unless PLINQ is more performant. PLINQ can be more performant than LINQ if querying the collection involves too many iterations AND/OR each iteration involves too much work.

But then two difficult questions arise:

How many iterations are too many iterations?
How much work is too much work?

My advice is to test your query. Test once using LINQ and once using PLINQ and then compare the two results.

Test 1: Increasing the number of iterations in the query by increasing the number of objects in the collection.

The overhead of initialising PLINQ takes around 20ms. If PLINQ's strengths aren't utilised, this is wasted time because LINQ has 0ms overhead.

The work involved in each iteration is always the same for each test. The work is kept minimal.

Definition of work: Multiplying the int (object in the collection) by 10.

When iterating 1 million objects where each iteration involves minimal work, PLINQ is faster than LINQ. Although in a professional environment, I've never queried or even initialised a collection of 10 million objects in memory so this might be an unlikely scenario where PLINQ happens to be superior to LINQ.

╔═══════════╦═══════════╦════════════╗
║ # Objects ║ LINQ (ms) ║ PLINQ (ms) ║
╠═══════════╬═══════════╬════════════╣
║ 1         ║         1 ║         20 ║
║ 10        ║         0 ║         18 ║
║ 100       ║         0 ║         20 ║
║ 1k        ║         0 ║         23 ║
║ 10k       ║         1 ║         17 ║
║ 100k      ║         4 ║         37 ║
║ 1m        ║        36 ║         76 ║
║ 10m       ║       392 ║        285 ║
║ 100m      ║      3834 ║       2596 ║
╚═══════════╩═══════════╩════════════╝

Test 2: Increasing the work involved in a iteration

I set the number of objects in the collection to always be 10 so the query involves a low number of iterations. For each test, I increased the work involved to process each iteration.

Definition of work: Multiplying the int (object in the collection) by 10.

Definition of increasing the work: Increasing the number of iterations to multiply the int by 10.

PLINQ was faster at querying the collection as the work was significantly increased when the number of iterations inside a work iteration was increased to 10 million and I concluded that PLINQ is superior to LINQ when a single iteration involves this amount of work.

"# Iterations" in this table means the number of iterations inside a work iteration. See Test 2 code below.

╔══════════════╦═══════════╦════════════╗
║ # Iterations ║ LINQ (ms) ║ PLINQ (ms) ║
╠══════════════╬═══════════╬════════════╣
║ 1            ║         1 ║         22 ║
║ 10           ║         1 ║         32 ║
║ 100          ║         0 ║         25 ║
║ 1k           ║         1 ║         18 ║
║ 10k          ║         0 ║         21 ║
║ 100k         ║         3 ║         30 ║
║ 1m           ║        27 ║         52 ║
║ 10m          ║       263 ║        107 ║
║ 100m         ║      2624 ║        728 ║
║ 1b           ║     26300 ║       6774 ║
╚══════════════╩═══════════╩════════════╝

Test 1 code:

class Program
{
    private static IEnumerable<int> _numbers;

    static void Main(string[] args)
    {
        const int numberOfObjectsInCollection = 1000000000;

        _numbers = Enumerable.Range(0, numberOfObjectsInCollection);

        var watch = new Stopwatch();

        watch.Start();

        var parallelTask = Task.Run(() => ParallelTask());

        parallelTask.Wait();

        watch.Stop();

        Console.WriteLine($"Parallel: {watch.ElapsedMilliseconds}ms");

        watch.Reset();

        watch.Start();

        var sequentialTask = Task.Run(() => SequentialTask());

        sequentialTask.Wait();

        watch.Stop();

        Console.WriteLine($"Sequential: {watch.ElapsedMilliseconds}ms");

        Console.ReadKey();
    }

    private static void ParallelTask()
    {
        _numbers
            .AsParallel()
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static void SequentialTask()
    {
        _numbers
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static int DoWork(int @int)
    {
        return @int * 10;
    }
}

Test 2 code:

class Program
{
    private static IEnumerable<int> _numbers;

    static void Main(string[] args)
    {
        _numbers = Enumerable.Range(0, 10);

        var watch = new Stopwatch();

        watch.Start();

        var parallelTask = Task.Run(() => ParallelTask());

        parallelTask.Wait();

        watch.Stop();

        Console.WriteLine($"Parallel: {watch.ElapsedMilliseconds}ms");

        watch.Reset();

        watch.Start();

        var sequentialTask = Task.Run(() => SequentialTask());

        sequentialTask.Wait();

        watch.Stop();

        Console.WriteLine($"Sequential: {watch.ElapsedMilliseconds}ms");

        Console.ReadKey();
    }

    private static void ParallelTask()
    {
        _numbers
            .AsParallel()
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static void SequentialTask()
    {
        _numbers
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static int DoWork(int @int)
    {
        const int numberOfIterations = 1000000000;

        for (int i = 0; i < numberOfIterations; i++)
        {
            @int = @int * 10;
        }

        return @int;
    }
}

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2019-01-20 10:04

Linq is a collection of technologies that work together to solve a similar family of problems - in all of them you have a source of data (xml file or files, database contents, collection of objects in memory) and you want to retrieve some or all of this data and act on it in some way. Linq works on the commonality of that set of problems such that:

var brithdays = from user in users where
  user.dob.Date == DateTime.Today && user.ReceiveMails
  select new{user.Firstname, user.Lastname, user.Email};
foreach(bdUser in birthdays)
  SendBirthdayMail(bdUser.Firstname, bdUser.Lastname, bdUser.Email);

And the equivalent (explicit use of Linq-related classes and methods with a traditional C# syntax):

var birthdays = users
  .Where(user => user.dob.Date == DateTime.Today)
  .Select(user => new{user.Firstname, user.Lastname, user.Email});
foreach(bdUser in birthdays)
  SendBirthdayMail(bdUser.Firstname, bdUser.Lastname, bdUser.Email);

Are both examples of code that could work regardless of whether it's going to be turned into database calls, parsing of xml documents, or a search through an array of objects.

The only difference is what sort of object users is. If it was a list, array, or other enumerable collection, it would be linq-to-objects, if it was a System.Data.Linq.Table it would be linq to sql. The former would result in in-memory operations, the latter in a SQL query that would then be deserialised to in-memory objects as late as possible.

If it was a ParallelQuery - produced by calling .AsParallel on an in-memory enumerable collection - then the query will be performed in-memroy, parallelised (most of the time) so as to performed by multiple threads - ideally keeping each core busy moving the work forward.

Obviously the idea here is to be faster. When it works well, it does.

There are some downsides though.

First, there's always some overhead to getting the parallelisation going, even in cases where it ends up not being possible to parallelise. If there isn't enough work being done on data, this overhead will out-weigh any potential gains.

Second, the benefits of parallel processing depends on the cores available. With a query that doesn't end up blocking on resources on a 4-core machine, you theoretically get a 4-times speed up (4 hyper-threaded might give you more or even less, but probably not 8-times since hyper-threading's doubling of some parts of the CPU doesn't give a clear two-times increase). With the same query on a single-core, or with processor affinity meaning only one core is available (e.g. a webserver in "web-garden" mode), then there's no speed-up. There could still be a gain if there's blocking on resources, but the benefit depends on the machine then.

Third, if there's any shared resource (maybe an collection results are being output to) is used in a non-threadsafe way, it can go pretty badly wrong with incorrect results, crashes, etc.

Fourth, if there's a shared resource being used in a threadsafe way, and that threadsafety comes from locking, there could be enough contention to become a bottleneck that undoes all the benefits from the parallelisation.

Fifth, if you've a four-core machine working on more or less the same algorithm on four different threads (perhaps in a client-server situation due to four clients, or on a desktop situation from a set of similar tasks higher in the process), then they're alreay making the best use of those cores. Splitting the work in the algorithm up so as to be handled across all four cores means you've moved from four threads using one core each to 16 threads fighting over four cores. At best it'll be the same, and likely overheads will make it slightly worse.

It can still be a major win in a lot of cases, but the above should make it clear that it won't always.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

6楼-- · 2019-01-20 10:05

Consider avoiding anonymous types while working with PLINQ because according to Threading in C#, by Joe Albahari:

anonymous types (being classes and therefore reference types) incur the cost of heap-based allocation and subsequent garbage collection.

(...)

stack-based allocation is highly parallelizable (as each thread has its own stack), whereas all threads must compete for the same heap — managed by a single memory manager and garbage collector.

0人赞添加讨论(0) 举报

Difference linq and plinq

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间