In high-performance computing, sums, products, etc are often calculated using a "parallel reduction" that takes n elements and completes in O(log n) time (given enough parallelism). In Haskell, we usually use a fold for this kind of calculation, but evaluation time is always linear in the length of the list.
Data Parallel Haskell has some of this built in, but what about in the common framework of a list? Can we do it with Control.Parallel.Strategies
?
So, assuming f
is associative, how do we write
parFold :: (a -> a -> a) -> [a] -> a
so that parFold f xs
only needs time logarithmic in length xs
?
Not sure what your
parFold
function is supposed to do. If that is intended to be a parallel version of foldr or foldl, I think its definition is wrong.Fold applies the same function to each element of the list and accumulates the result of each application. Coming up with a parallel version of it, i guess, would require that the function application to the elements are done in parallel - a bit like what
parList
does.I don't think a list is the right data type for this. Because it's just a linked list, the data will necessarily be accessed sequentially. Although you can evaluate the items in parallel, you won't gain much in the reduction step. If you really need a List, I think the best function would be just
or maybe
If the reduction step is complex, you might get a gain by subdividing the list like this:
I've taken the liberty of assuming your data is a
Monoid
for mempty, if this isn't possible you can either replace mempty with your own empty type, or worse case usefoldl1'
.There are two operators from
Control.Parallel.Strategies
in use here. TheparList
evaluates all items of the list in parallel. After that, thechunkList
divides the list into chunks of 1000 elements. Each of those chunks is then reduced in parallel by theparMap
.You might also try
Depending on exactly how the work is distributed, one of these may be more efficient than the others.
If you can use a data structure that has good support for indexing though (Array, Vector, Map, etc.), then you can do binary subdivisions for the reduction step, which will probably be better overall.
This seems like a good start:
It works, but parallelism is not so great. Replacing
parList
with something likeparListChunks 1000
helps a bit, but speedup is still limited to under 1.5x on an 8-core machine.