The mapAndSum
function in the code block all the way below combines map
and sum
(never mind that another sum
is applied in the main function, it just serves to make the output compact). The map
is computed lazily, while the sum
is computed using an accumulating parameter. The idea is that the result of map
can be consumed without ever having the complete list in memory, and (only) afterwards the sum
is available "for free". The main function indicates that we had a problem with irrefutable patterns when calling mapAndSum
. Let me explain this problem.
According to the Haskell standard, the irrefutable pattern example let (xs, s) = mapAndSum ... in print xs >> print s
is translated into
(\ v -> print (case v of { (xs, s) -> xs })
>> print (case v of { (xs, s) -> s }))
$ mapAndSum ...
And hence, both print
calls carry a reference to the whole pair, which leads to the whole list being kept in memory.
We (my colleague Toni Dietze and I) solved this using an explicit case
statement (compare "bad" vs "good2"). BTW, finding this out took us a considerable amount of time..!
Now, what we do not understand is two-fold:
Why does mapAndSum
work in the first place? It also uses an irrefutable pattern, so it should also keep the whole list in memory, but it obviously doesn't. And converting the let
into a case
would make the function behave completely unlazy (to the point that the stack overflows; no pun intended).
We had a look at the "core" code generated by GHC, but as far as we could interpret it, it actually contained the same translation of let
as the one above. So no clue here, and more confusion instead.
Why does "bad?" work when optimization is turned off, but not when it is turned on?
One remark concerning our actual application: we want to achieve stream processing (format conversion) of a large text file while also accumulating some value that is then written to a separate file. As indicated, we were successful, but the two questions remain, and answers may improve our understanding of GHC for upcoming tasks and problems.
Thank you!
{-# LANGUAGE BangPatterns #-}
-- Tested with The Glorious Glasgow Haskell Compilation System, version 7.4.2.
module Main where
import Control.Arrow (first)
import Data.List (foldl')
import System.Environment (getArgs)
mapAndSum :: Num a => (a -> b) -> [a] -> ([b], a)
mapAndSum f = go 0
where go !s (x : xs) = let (xs', s') = go (s + x) xs in (f x : xs', s')
-- ^ I have no idea why it works here. (TD)
go !s [] = ([], s)
main :: IO ()
main = do
let foo = mapAndSum (^ (2 :: Integer)) [1 .. 1000000 :: Integer]
let sum' = foldl' (+) 0
args <- getArgs
case args of
["bad" ] -> let (xs, s) = foo in print (sum xs) >> print s
["bad?"] -> print $ first sum' $ foo
-- ^ Without ghc's optimizer, this version is as memory
-- efficient as the “good” versions
-- With optimization “bad?” is as bad as “bad”. Why? (TD)
["good1"] -> print $ first' sum' $ foo
where first' g (x, y) = (g x, y)
["good2"] -> case foo of
(xs, s) -> print (sum' xs) >> print s
["good3"] -> (\ (xs, s) -> print (sum' xs) >> print s) $ foo
_ -> error "Sorry, I do not understand."
Let me first answer why mapAndSome
can work good at all: What you see is (very likely) the effect of an optimization described by Philip Wadler in “Fixing some space leaks with a garbage collector”. Short summary: If the garbage collector sees a thunk of the form fst x
and x
is already evaluated to the tuple constructor, e.g. (y,z)
, it will replace fst x
by y
, possibly freeing up z
if it is not referenced anywhere else.
In your code, the s'
will, once the result of go
is evaluated to a tuple and after one round of GCing, not keep a reference to the tuple but will be replaced by the accumulated parameter.
Now let’s look a the other patterns by investigating core. The foo
binding is compiled to:
foo_r2eT :: ([Type.Integer], Type.Integer)
foo_r2eT =
case $wgo_r2eP mapAndSum1 lvl2_r2eS
of _ { (# ww1_s2d7, ww2_s2d8 #) ->
(ww1_s2d7, ww2_s2d8)
}
Here is the code in the "bad"
case (lvl18_r2fd
is "bad"):
case eqString ds_dyA lvl18_r2fd of _ {
False -> $wa_s2da new_s_a14o;
True ->
case ds1_dyB of _ {
[] ->
case Handle.Text.hPutStr2
Handle.FD.stdout lvl17_r2fc True new_s_a14o
of _ { (# new_s1_X15h, _ #) ->
Handle.Text.hPutStr2
Handle.FD.stdout lvl16_r2fb True new_s1_X15h
};
: ipv_sIs ipv1_sIt -> $wa_s2da new_s_a14o
}
We can see that two values at the module level are printed, lvl17_r2fc
and lvl16_r2fb
, here is their code:
lvl17_r2fc :: String
[GblId]
lvl17_r2fc =
case foo_r2eT of _ { (xs_Xqp, s_Xq9) ->
$w$cshowsPrec
0
(Data.List.sum_sum' xs_Xqp Data.List.genericDrop2)
([] @ Char)
}
lvl16_r2fb :: String
[GblId]
lvl16_r2fb =
case foo_r2eT of _ { (xs_apS, s_Xqp) ->
$w$cshowsPrec 0 s_Xqp ([] @ Char)
}
Why are they bound at the module level, and not inside the expression? This is an effect of lazy lifting, another optimization that increases sharing and hence sometime has an adverse effect on space performance. See GHC ticket 719 for another occurrence of this effect.
So what happens is that the evaluation of lvl17_r2fc
causes foo
to be evaluated, and the left entry lazily printed. Unfortunately, lvl16_r2fb
is still live and retains the full tuple. And because the garbage collector (seems) to fail to see that this is an selector thunk, Wadler’s optimization does not kick in.
In contrast, here is the code for "good1"
a.k.a. lvl8_r2f1
:
case eqString ds_dyA lvl8_r2f1 of _ {
False -> $wa2_s2dI w3_s2cF;
True ->
case ds1_dyB of _ {
[] ->
Handle.Text.hPutStr2
Handle.FD.stdout lvl7_r2f0 True w3_s2cF;
: ipv_sHg ipv1_sHh -> $wa2_s2dI w3_s2cF
}
} } in
where the printed value is this string:
lvl7_r2f0 :: String
[GblId]
lvl7_r2f0 =
case foo_r2eT of _ { (x_af6, y_af7) ->
show_tuple
(:
@ ShowS
(let {
w2_a2bY [Dmd=Just L] :: Type.Integer
w2_a2bY = lgo_r2eU mapAndSum1 x_af6 } in
\ (w3_a2bZ :: String) ->
$w$cshowsPrec 0 w2_a2bY w3_a2bZ)
(:
@ ShowS
(\ (w2_a2bZ :: String) ->
$w$cshowsPrec 0 y_af7 w2_a2bZ)
([] @ ShowS)))
([] @ Char)
}
As you can see the tuple is taken apart only once, and both values are used. So nothing refers to the tuple as a whole and it can be garbage collected. Similar for "good2"
and "good3"
.
Now to "bad?"
: In the unoptimized case, we get this code:
case eqString ds_dyA (unpackCString# "bad?")
of _ {
False -> fail2_dyN realWorld#;
True ->
case ds1_dyB of _ {
[] ->
$
@ (Type.Integer, Type.Integer)
@ (IO ())
(System.IO.print
@ (Type.Integer, Type.Integer) $dShow_rzk)
($
@ ([Type.Integer], Type.Integer)
@ (Type.Integer, Type.Integer)
(Control.Arrow.first
@ (->)
Control.Arrow.$fArrow(->)
@ [Type.Integer]
@ Type.Integer
@ Type.Integer
sum'_rzm)
foo_rzl);
: ipv_szd ipv1_sze -> fail2_dyN realWorld#
}
} } in
The implementation of first
via ***
uses refutable patterns, so the kind of selector thunks that the garbage collector handles well are generated.
In the optimized case, things are a bit scattered, but anyways here is the relevant code (the last value is the one that is being printed):
w_r2f2 :: Type.Integer
w_r2f2 =
case foo_r2eT of _ { (x_aI1, y_aI2) ->
lgo_r2eU mapAndSum1 x_aI1
}
lvl9_r2f3 :: String -> String
[GblId, Arity=1]
lvl9_r2f3 =
\ (w2_a2bZ :: String) ->
$w$cshowsPrec 0 w_r2f2 w2_a2bZ
w1_r2f4 :: Type.Integer
w1_r2f4 = case foo_r2eT of _ { (x_aI6, y_aI7) -> y_aI7 }
lvl10_r2f5 :: String -> String
[GblId, Arity=1]
lvl10_r2f5 =
\ (w2_a2bZ :: String) ->
$w$cshowsPrec 0 w1_r2f4 w2_a2bZ
lvl11_r2f6 :: [ShowS]
[GblId]
lvl11_r2f6 =
:
@ ShowS lvl10_r2f5 ([] @ ShowS)
lvl12_r2f7 :: [ShowS]
[GblId]
lvl12_r2f7 = : @ ShowS lvl9_r2f3 lvl11_r2f6
lvl13_r2f8 :: ShowS
[GblId]
lvl13_r2f8 = show_tuple lvl12_r2f7
lvl14_r2f9 :: String
[GblId]
lvl14_r2f9 = lvl13_r2f8 ([] @ Char)
The use of first
has been inlined. We see two calls to case foo_r2eT
, so this is prone for a space leak, despite w1_rf24
looking like a selector (so I’d expect the runtime to apply Wadler’s optimization). Maybe it does not work well for static thunks? Indeed the commentary, if up to date, only talks about dynamic allocated selector thunks. So if your foo
were not a module-level-value (or rather lazy-liftable into one) but rather dependent on some input, w1_rf24
might be dynamically allocated and hence eligible for the special treatment. But then the code might look very different anyways.