I'm trying to understand why we need all parts of the standard sample code:
a `par` b `pseq` a+b
Why won't the following be sufficient?
a `par` b `par` a+b
The above expression seems very descriptive: Try to evaluate both a
and b
in parallel, and return the result a+b
. Is the reason only that of efficiency: the second version would spark off twice instead of once?
How about the following, more succinct version?
a `par` a+b
Why would we need to make sure b
is evaluated before a+b
as in the original, standard code?
a `par` b `par` a+b
creates sparks for botha
andb
, buta+b
is reached immediately so one of the sparks will fizzle (i.e., it is evaluated in the main thread). The problem with this is efficiency, as we created an unnecessary spark. If you're using this to implement parallel divide & conquer then the overhead will limit your speedup.a `par` a+b
seems better because it only creates a single spark. However, attempting to evaluatea
beforeb
will fizzle the spark fora
, and asb
does not have a spark this will result in sequential evaluation ofa+b
. Switching the order tob+a
would solve this problem, but as code this doesn't enforce ordering and Haskell could still evaluate that asa+b
.So, we do
a `par` b `pseq` a+b
to force evaluation ofb
in the main thread before we attempt to evaluatea+b
. This gives thea
spark chance to materialise before we try evaluatinga+b
, and we haven't created any unnecessary sparks.will evaluate a and b in parallel and returns a+b, yes.
However, the pseq there ensures both a and b are evaluated before a+b is.
See this link for more details on that topic.
Ok. I think the following paper answers my question: http://community.haskell.org/~simonmar/papers/threadscope.pdf
In summary, the problem with
and
is the lack of ordering of evaluation. In both versions, the main thread gets to work on
a
(or sometimesb
) immediately, causing the sparks to "fizzle" away immediately since there is no more need to start a thread to evaluate what the main thread has already started evaluating.The original version
ensures the main thread works on
b
beforea+b
(or else would have started evaluatinga
instead), thus giving a chance for the sparka
to materialize into a thread for parallel evaluation.