Multicore programming in Haskell - Control.Paralle

I'm trying to learn how to use the Control.Parallel module, but I think I didn't get it right.

I'm trying to run the following code (fibs.hs).

import Control.Parallel

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = p `par` (q `pseq`  (p + q))
    where
      p = fib (n-1)
      q = fib (n-2)


main = print $ fib 30

I compiled this with:

ghc -O2 --make -threaded fibs.hs

And then I get the following results executing this program (output of a Python script that runs each program 100 times and returns the average and standard deviation of the execution time):

./fibs +RTS -N1 -> avg= 0.060203 s, deviation = 0.004112 s  
./fibs +RTS -N2 -> avg= 0.052335 s, deviation = 0.006713 s  
./fibs +RTS -N3 -> avg= 0.052935 s, deviation = 0.006183 s  
./fibs +RTS -N4 -> avg= 0.053976 s, deviation = 0.007106 s  
./fibs +RTS -N5 -> avg= 0.055227 s, deviation = 0.008598 s  
./fibs +RTS -N6 -> avg= 0.055703 s, deviation = 0.006537 s  
./fibs +RTS -N7 -> avg= 0.058327 s, deviation = 0.007526 s

My questions are:

What exactly is happening when I evaluate:
```
a `par` (b `pseq` (a + b))   ?
```
I understand that a par b is supposed to hint the compiler about calculating a in parallel with b and return b. OK. But what does pseq do?
Why do I see such a small performance increase? I'm running this in an Intel Core 2 Quad machine. I'd expect that running with -N5 or -N6 wouldn't make a real difference in performance or that the program would actually start to perform very badly. But why do I see no improvement from -N2 to -N3 and why is the initial improvement so small?

标签： multithreading haskell parallel-processing multicore

4条回答

爷、活的狠高调

2楼-- · 2019-03-25 20:39

As Don explained, the problem is that you are creating too many sparks. Here's how you might rewrite it to get a good speedup.

import Control.Parallel

cutoff :: Int
cutoff = 20

parFib :: Int -> Int
parFib n | n < cutoff = fib n
parFib n = p `par` q `pseq` (p + q)
    where
      p = parFib $ n - 1
      q = parFib $ n - 2

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

main :: IO ()
main = print $ parFib 40

demonstration:

[computer ~]$ ghc --make -threaded -O2 Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
[computer ~]$ time ./Main +RTS -N1
102334155

real    0m1.509s
user    0m1.450s
sys     0m0.003s
[computer ~]$ time ./Main +RTS -N2
102334155

real    0m0.776s
user    0m1.487s
sys     0m0.023s
[computer ~]$ time ./Main +RTS -N3
102334155

real    0m0.564s
user    0m1.487s
sys     0m0.030s
[computer ~]$ time ./Main +RTS -N4
102334155

real    0m0.510s
user    0m1.587s
sys     0m0.047s
[computer ~]$

0人赞添加讨论(0) 举报

Rolldiameter

3楼-- · 2019-03-25 20:40

You're creating an exponential number of sparks (think of how many recursive calls you're creating here). To actually get good parallelism you need to create less parallel work in this case, since your hardware can't handle that many threads (and so GHC doesn't make them).

The solution is to use a cutoff strategy, as described in this talk: http://donsbot.wordpress.com/2009/09/05/defun-2009-multicore-programming-in-haskell-now/

Basically, switch over to the straight line version once you reach a certain depth, and use +RTS -sstderr to see how many sparks are being converted, so you can determine if you're wasting work or not.

0人赞添加讨论(0) 举报

Bombasti

4楼-- · 2019-03-25 20:43

Since nobody gave a definitive answer about pseq, here's the official description:

Semantically identical to seq, but with a subtle operational difference: seq is strict in both its arguments, so the compiler may, for example, rearrange a seq b into b seq a seq b. This is normally no problem when using seq to express strictness, but it can be a problem when annotating code for parallelism, because we need more control over the order of evaluation; we may want to evaluate a before b, because we know that b has already been sparked in parallel with par.

This is why we have pseq. In contrast to seq, pseq is only strict in its first argument (as far as the compiler is concerned), which restricts the transformations that the compiler can do, and ensures that the user can retain control of the evaluation order.

0人赞添加讨论(0) 举报

萌系小妹纸

5楼-- · 2019-03-25 20:51

Re (1): par allows a to be computed in another thread. I am guessing here, but I think pseq behaves much like a seq: that it forces the first result to be computed first (well, seq isn't guaranteed to do this, but in practice on GHC it does). So in this case, the computation of a is forked off as one thread, and the other thread computes b and then sums a and b.

Re (2): This is a pretty trivial computation being forked off to other threads; it's probably just as fast for the cpu to just calculate it itself. I'm betting the overhead of threads is hurting almost as much as helping for this simple computation.

0人赞添加讨论(0) 举报

Multicore programming in Haskell - Control.Paralle

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间