I have a program I'm trying to parallelize (full paste with runnable code here).
I've profiled and found that the majority of time is spent in findNearest
which is essentially a simple foldr
over a large Data.Map
.
findNearest :: RGB -> M.Map k RGB -> (k, Word32)
findNearest rgb m0 =
M.foldrWithKey' minDistance (k0, distance rgb r0) m0
where (k0, r0) = M.findMin m0
minDistance k r x@(_, d1) =
-- Euclidean distance in RGB-space
let d0 = distance rgb r
in if d0 < d1 then (k, d0) else x
parFindNearest
is supposed to execute findNearest
in parallel over subtrees of the larger Map
.
parFindNearest :: NFData k => RGB -> M.Map k RGB -> (k, Word32)
parFindNearest rgb = minimumBy (comparing snd)
. parMap rdeepseq (findNearest rgb)
. M.splitRoot
Unfortunately GHC GC's most my sparks before they are converted into useful parallelism.
Here's the result of compiling with ghc -O2 -threaded
and running with +RTS -s -N2
839,892,616 bytes allocated in the heap
123,999,464 bytes copied during GC
5,320,184 bytes maximum residency (19 sample(s))
3,214,200 bytes maximum slop
16 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1550 colls, 1550 par 0.23s 0.11s 0.0001s 0.0004s
Gen 1 19 colls, 18 par 0.11s 0.06s 0.0030s 0.0052s
Parallel GC work balance: 16.48% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)
SPARKS: 215623 (1318 converted, 0 overflowed, 0 dud, 198111 GC'd, 16194 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 3.72s ( 3.66s elapsed)
GC time 0.34s ( 0.17s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.07s ( 3.84s elapsed)
Alloc rate 225,726,318 bytes per MUT second
Productivity 91.6% of total user, 97.1% of total elapsed
gc_alloc_block_sync: 9862
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 2103
As you can see, the majority of sparks are GC'd or fizzle before being converted. I've tried experimenting with different strictness, having findNearest
return a custom strict pair data type instead of a tuple
, or using rdeepseq from Control.Parallel.Strategies
, but my sparks are still GC'd.
I'd like to know
- why are my sparks being GC'd before being converted?
- how can I change my program to take advantage of parallelism?