Using clojure I have a very large amount of data in a sequence and I want to process it in parallel, with a relatively small number of cores (4 to 8).
The easiest thing to do is use pmap
instead of map
, to map my processing function over the sequence of data. But the coordination overhead results in a net loss in my case.
I think the reason is that pmap
assumes the function mapped across the data is very costly. Looking at pmap's source code it appears to construct a future
for each element of the sequence in turn so each invocation of the function occurs on a separate thread (cycling over the number of available cores).
Here is the relevant piece of pmap's source:
(defn pmap
"Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead."
([f coll]
(let [n (+ 2 (.. Runtime getRuntime availableProcessors))
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets))))
;; multi-collection form of pmap elided
In my case the mapped function is not that expensive but sequence is huge (millions of records). I think the cost of creating and dereferencing that many futures is where the parallel gain is lost in overhead.
Is my understanding of pmap
correct?
Is there a better pattern in clojure for this sort of lower cost but massively repeated processing than pmap
? I am considering chunking the data sequence somehow and then running threads on larger chunks. Is this a reasonable approach and what clojure idioms would work?
Sadly not a valid answer yet, but something to watch for in the future is Rich's work with the fork/join library coming in Java 7. If you look at his Par branch on github he's done some work with it, and last I had seen the early returns were amazing.
Example of Rich trying it out.
http://paste.lisp.org/display/84027
You can use some sort of map/reduce implemented by hand. Also take a look at swarmiji framework.
"A distributed computing system that helps writing and running Clojure code in parallel - across cores and processors"
The fork/join work mentioned in earlier answers on this and similar threads eventually bore fruit as the reducers library, which is probably worth a look.
This question: how-to-efficiently-apply-a-medium-weight-function-in-parallel also addresses this problem in a very similar context.
The current best answer is to use
partition
to break it into chunks. then pmap a map function onto each chunk. then recombine the results. map-reduce-style.