The mapAndSum
function in the code block all the way below combines map
and sum
(never mind that another sum
is applied in the main function, it just serves to make the output compact). The map
is computed lazily, while the sum
is computed using an accumulating parameter. The idea is that the result of map
can be consumed without ever having the complete list in memory, and (only) afterwards the sum
is available "for free". The main function indicates that we had a problem with irrefutable patterns when calling mapAndSum
. Let me explain this problem.
According to the Haskell standard, the irrefutable pattern example let (xs, s) = mapAndSum ... in print xs >> print s
is translated into
(\ v -> print (case v of { (xs, s) -> xs })
>> print (case v of { (xs, s) -> s }))
$ mapAndSum ...
And hence, both print
calls carry a reference to the whole pair, which leads to the whole list being kept in memory.
We (my colleague Toni Dietze and I) solved this using an explicit case
statement (compare "bad" vs "good2"). BTW, finding this out took us a considerable amount of time..!
Now, what we do not understand is two-fold:
Why does
mapAndSum
work in the first place? It also uses an irrefutable pattern, so it should also keep the whole list in memory, but it obviously doesn't. And converting thelet
into acase
would make the function behave completely unlazy (to the point that the stack overflows; no pun intended).We had a look at the "core" code generated by GHC, but as far as we could interpret it, it actually contained the same translation of
let
as the one above. So no clue here, and more confusion instead.Why does "bad?" work when optimization is turned off, but not when it is turned on?
One remark concerning our actual application: we want to achieve stream processing (format conversion) of a large text file while also accumulating some value that is then written to a separate file. As indicated, we were successful, but the two questions remain, and answers may improve our understanding of GHC for upcoming tasks and problems.
Thank you!
{-# LANGUAGE BangPatterns #-}
-- Tested with The Glorious Glasgow Haskell Compilation System, version 7.4.2.
module Main where
import Control.Arrow (first)
import Data.List (foldl')
import System.Environment (getArgs)
mapAndSum :: Num a => (a -> b) -> [a] -> ([b], a)
mapAndSum f = go 0
where go !s (x : xs) = let (xs', s') = go (s + x) xs in (f x : xs', s')
-- ^ I have no idea why it works here. (TD)
go !s [] = ([], s)
main :: IO ()
main = do
let foo = mapAndSum (^ (2 :: Integer)) [1 .. 1000000 :: Integer]
let sum' = foldl' (+) 0
args <- getArgs
case args of
["bad" ] -> let (xs, s) = foo in print (sum xs) >> print s
["bad?"] -> print $ first sum' $ foo
-- ^ Without ghc's optimizer, this version is as memory
-- efficient as the “good” versions
-- With optimization “bad?” is as bad as “bad”. Why? (TD)
["good1"] -> print $ first' sum' $ foo
where first' g (x, y) = (g x, y)
["good2"] -> case foo of
(xs, s) -> print (sum' xs) >> print s
["good3"] -> (\ (xs, s) -> print (sum' xs) >> print s) $ foo
_ -> error "Sorry, I do not understand."
Let me first answer why
mapAndSome
can work good at all: What you see is (very likely) the effect of an optimization described by Philip Wadler in “Fixing some space leaks with a garbage collector”. Short summary: If the garbage collector sees a thunk of the formfst x
andx
is already evaluated to the tuple constructor, e.g.(y,z)
, it will replacefst x
byy
, possibly freeing upz
if it is not referenced anywhere else.In your code, the
s'
will, once the result ofgo
is evaluated to a tuple and after one round of GCing, not keep a reference to the tuple but will be replaced by the accumulated parameter.Now let’s look a the other patterns by investigating core. The
foo
binding is compiled to:Here is the code in the
"bad"
case (lvl18_r2fd
is "bad"):We can see that two values at the module level are printed,
lvl17_r2fc
andlvl16_r2fb
, here is their code:Why are they bound at the module level, and not inside the expression? This is an effect of lazy lifting, another optimization that increases sharing and hence sometime has an adverse effect on space performance. See GHC ticket 719 for another occurrence of this effect.
So what happens is that the evaluation of
lvl17_r2fc
causesfoo
to be evaluated, and the left entry lazily printed. Unfortunately,lvl16_r2fb
is still live and retains the full tuple. And because the garbage collector (seems) to fail to see that this is an selector thunk, Wadler’s optimization does not kick in.In contrast, here is the code for
"good1"
a.k.a.lvl8_r2f1
:where the printed value is this string:
As you can see the tuple is taken apart only once, and both values are used. So nothing refers to the tuple as a whole and it can be garbage collected. Similar for
"good2"
and"good3"
.Now to
"bad?"
: In the unoptimized case, we get this code:The implementation of
first
via***
uses refutable patterns, so the kind of selector thunks that the garbage collector handles well are generated.In the optimized case, things are a bit scattered, but anyways here is the relevant code (the last value is the one that is being printed):
The use of
first
has been inlined. We see two calls tocase foo_r2eT
, so this is prone for a space leak, despitew1_rf24
looking like a selector (so I’d expect the runtime to apply Wadler’s optimization). Maybe it does not work well for static thunks? Indeed the commentary, if up to date, only talks about dynamic allocated selector thunks. So if yourfoo
were not a module-level-value (or rather lazy-liftable into one) but rather dependent on some input,w1_rf24
might be dynamically allocated and hence eligible for the special treatment. But then the code might look very different anyways.