How to write parallel code with Haskell vectors?

2019-02-09 10:05发布

问题:

One one hand, in Haskell Vector a seems to be the preferred type to use as an array of numbers. There is even an (incomplete) Vector Tutorial.

On the other hand, Control.Parallel.Strategies are defined mostly in terms of Traversable. Vector library doesn't provide these instances.

The minimal complete definition of Traversable t should also define Foldable and

traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
sequenceA :: Applicative f => t (f a) -> f (t a)

I don't see how sequenceA can be defined for Data.Vector.Unboxed.Vector. So, what is the best approach to writing parallel code with unboxed vectors? Defining some new ad hoc strategies like evalVector or using par and pseq explicitly or using plain Data.Array instead of vectors?

P.S. Plain Arrays are parallelizable without problems: https://gist.github.com/701888

回答1:

It's a hack job for parVector but this worked for me:

import qualified Data.Vector as V
import Control.Parallel.Strategies
import Control.Parallel
import Control.DeepSeq

ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

main = do
  let vec = V.enumFromN 1 1000
  let res = (V.map (ack 2) vec) `using` parVector
  print res

parVector :: NFData a => Strategy (V.Vector a)
parVector vec = eval vec `seq` Done vec
  where
  chunkSize = 1
  eval v
    | vLen == 0 = ()
    | vLen <= chunkSize = rnf (v V.! 0) -- FIX this to handle chunks > 1
    | otherwise = eval (V.take half v) `par` eval (V.drop half v)
    where vLen = V.length v
          half = vLen `div` 2

And running this code:

[tommd@Mavlo Test]$ ghc --make -O2 -threaded t.hs
... dumb warning ...
[tommd@Mavlo Test]$ time ./t +RTS -N1 >/dev/null
real    0m1.962s user    0m1.951s sys     0m0.009s
[tommd@Mavlo Test]$ time ./t +RTS -N2 >/dev/null
real    0m1.119s user    0m2.221s sys 0m0.005s

When I run the code with Integer instead of Int in the type signature:

[tommd@Mavlo Test]$ time ./t +RTS -N2 >/dev/null

real    0m4.754s
user    0m9.435s
sys     0m0.028s
[tommd@Mavlo Test]$ time ./t +RTS -N1 >/dev/null

real    0m9.008s
user    0m8.952s
sys     0m0.029s

Rock!

EDIT: And a solution that is a bit closer to your earlier attempt is cleaner (it doesn't use functions from three separate modules) and works great:

parVector :: NFData a => Strategy (V.Vector a)
parVector vec =
  let vLen = V.length vec
      half = vLen `div` 2
      minChunk = 10
  in  if vLen > minChunk
      then do
        let v1 = V.unsafeSlice 0 half vec
            v2 = V.unsafeSlice half (vLen - half) vec
        parVector v1
        parVector v2
        return vec
      else
        evalChunk (vLen-1) >>
        return vec
  where
  evalChunk 0 = rpar (rdeepseq (vec V.! 0)) >> return vec
  evalChunk i = rpar (rdeepseq (vec V.! i)) >> evalChunk (i-1)

Things to learn from this solution:

  1. It uses the Eval monad, which is strict so we're sure to spark everything (compared to wrapping things in let and remembering to use bang patterns).
  2. Contrary to your proposed implementation it (a) doesn't construct a new vector, which is costly (b) evalChunk forces evaluation of each element using rpar and rdeepseq (I don't believe rpar vec forces any of the vector's elements).
  3. Contrary to my belief, slice takes a start index and length, not a start and end index. Oops!
  4. We still need to import Control.DeepSeq (NFData), but I've e-mailed the libraries list to try and fix that issue.

Performance seems similar to the first parVector solution in this answer, so I won't post numbers.



回答2:

1) As you probably know, vector is a product of the DPH work that has proven harder than the researchers initially expected.

2) Unboxed vectors can't divide up the work for individual elements across multiple CPUs.

3) I'd be a lot more hopeful for boxed vectors. Something like:

using (map (rnf . (vec !)) [0..V.length vec - 1]) (parList rdeepseq)

Or maybe you can avoid constructing the list and using parlist. I think just assigning parts of the array is sufficient. The below code is likely broken, but the concept of making your own parVector using rnf and dividing the vector in half until it is a single element (or some tunable chunk size of elements) should work.

parVector :: Strategy (Vector a)
parVector = let !_ = eval vec in Done vec
  where
  chunkSize = 1
  eval v
    | vLen == 0 = ()
    | vLen <= chunkSize = rnf (v ! 0) -- FIX this to handle chunks > 1
    | otherwise = eval (V.take half v) `par` eval (V.drop half v)
    where vLen = V.length v
          half = vLen `div` 2