I'm trying to understand when the usage of parallel
will increase the performance.
I tested it with a simple code that ran over 100,000 items in List<Person>
and changed the name of each one to string.Empty
.
The parallel version took twice the time then the regular version. (Yes I tested with more the one core...)
I saw this answer saying a slice of data that not always the parallel is good for performance.
Also this caution repeated in each page of the parallel examples in the MSDN tutorial:
These examples are primarily intended to demonstrate usage, and may or
may not run faster than the equivalent sequential LINQ to Objects
queries
I need some rules and tips when parallel will increase the performance of my code and when will not.
The obvious answer is "Test your code, if the parallel loop is faster use it", is absolutely right, but I guess no one run performance analysis on each loop he writes.
Think about when it is worthwhile to parallelize something in real life. When is it better to just sit down and do a job yourself from start to finish, and when is it better to hire twenty guys?
Is the work inherently parallelizable or inherently serial? Some jobs are not parallelizable at all: nine women can't work together to make one baby in a month. Some jobs are parallelizable but give lousy results: you could hire twenty guys and assign each of them fifty pages of War and Peace to read for you, and then have each of them write one twentieth of an essay, glue all the essay fragments together and submit the paper; that's unlikely to result in a good grade. Some jobs are very parallelizable: twenty guys with shovels can dig a hole much faster than one guy.
If the work is inherently parallelizable, does parallelization actually save time? You can cook a pot of spaghetti with a hundred noodles in it, or you can cook twenty pots of spaghetti with five noodles in each and pour the results together at the end. I guarantee you that parallelizing the task of cooking spaghetti does not result in getting your dinner any faster.
If the work is inherently parallelizable, and there is a possible time savings, does the cost of hiring those guys pay for the savings in time? If it's faster to just do the job yourself than it is to hire the guys, parallelization is not a win. Hiring twenty guys to do a job that takes you five seconds, and hoping that they'll get it done in a quarter second is not a savings if it takes you a day to find the guys.
Parallelization tends to be a win when the work is enormous and parallelizable. Setting a hundred thousand pointers to null is something a computer can do in a tiny fraction of a second; there's no enormous cost, so there's no savings. Try doing something non-trivial; say, write a compiler and do semantic analysis of method bodies in parallel. You'll be more likely to get a win there.
If you are iterating over a collection and doing something that is computationally intensive to each element (especially if the "something" is not also I/O intensive), then you are likely to see some benefit from parallelizing the loop. Setting a property to string.Empty
is not computationally expensive, which is probably why you didn't get an improvement.
A loop will benefit from parallelism when the computations performed in parallel is greater than the overhead of using parallelism (thread startup, thread switching, communication, thread contention, etc). Your test seems to imply that parallism should benefit trivial calculations, it doesn't. What it is showing you is that there is overhead to paralellism. The amount of work must be greater (and ususally significantly greater) than the overhead for you to see any benefit.
You also seem to dismiss testing. Testing is the only way you will know if parallism is buying you anything. You don't need to performance test every loop, just the performance critical ones. If the loop isn't performance critical why even bother making it parallel? And if it is critial enough to spend time making it parallel you better have a test in place to ensure that you are getting benefit from your labor and regression tests to ensure some clever programmer later doesn't destroy your work.
To me, there are a couple of rules when you should think about parallelizing your code (and even then, you should still test to see if it is faster):
- The code you want to parallelize is computationally intensive. Just waiting for IO typically wont net you much benefit. It has to be something where you would definitely make use of a bunch of CPU time (like rendering an image).
- The code you want to parallelize is complex enough that the overhead of making the parallelization is less than the savings you get from distributing the code (ie, setting a string to string.Empty is incredibly simple and fast; you would need something much more complex per item to make it worth it)
- The code you want to parallelize is independent and has no dependencies on other items.
Parallelism helps performance only to the extent that it lets you get all your hardware cranking in a useful direction.
Two CPU-bound threads will not be faster than one if they have to share a single core.
In fact, they will be slower.
There are other reasons than performance for using multiple threads.
For example, web applications, that have to interact with many simultaneous users, can be written as a single thread that just responds to interrupts.
However, it simplifies the code enormously if it can be written with threads.
That doesn't make the code any faster.
It makes it easier to write.