Consider the following program. It runs forever and does nothing useful, but the memory consumption in ghci is constant :
--NoExplode.hs
module Main (main) where
test :: [Int] -> IO()
test lst = do
print "test"
rList lst
rList :: [Int] -> IO ()
rList [] = return ()
rList (x:xs) = do
rList xs
main = do
test [1..]
Now consider the following trivially modified version of the above. When this program is run in ghci the memory explodes. The only difference is that print "test"
is now assigned to x
in the do
block of test
.
--Explode.hs
module Main (main) where
test :: [Int] -> IO()
test lst = do
x <- print "test"
rList lst
rList :: [Int] -> IO ()
rList [] = return ()
rList (x:xs) = do
rList xs
main = do
test [1..]
Why does changing print "test"
to x <- print "test"
cause ghci to blow up?
p.s. I came across this when trying to understand Memory exploding upon writing a lazy bytestring to file in ghci, and the problem there (I think) essentially distils to the above. Thanks
Disclaimer: I'm not a GHCi expert, and also not that good with GHC core. Now that I've lost my credibility, lets try to understand what happens:
GHCi and CAFs
GHCi retains all evaluated CAFs:
Normally, any evaluation of top-level expressions (otherwise known as CAFs or Constant Applicative Forms) in loaded modules is retained between evaluations.
Now you might wonder why there's such a big difference between both versions. Lets have a look at the core with -ddump-simpl
. Note that you might want to drop -dsuppress-all
when you dump the programs yourself.
Dumps of your programs
Non-exploding version:
❯ ghc SO.hs -ddump-simpl -fforce-recomp -O0 -dsuppress-all
[1 of 1] Compiling Main ( SO.hs, SO.o )
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 29, types: 28, coercions: 0}
$dShow_rq2
$dShow_rq2 = $fShow[] $fShowChar
Rec {
rList_reI
rList_reI =
\ ds_dpU ->
case ds_dpU of _ {
[] -> return $fMonadIO ();
: x_aho xs_ahp -> rList_reI xs_ahp
}
end Rec }
main
main =
>>
$fMonadIO
(print $dShow_rq2 (unpackCString# "test"))
(rList_reI (enumFrom $fEnumInt (I# 1)))
main
main = runMainIO main
The important part is the location of [1..]
, almost at the end:
enumFrom $fEnumInt (I# 1))
As you can see, the list isn't a CAF. But what happens if we instead use the exploding version?
Exploding version
❯ ghc SO.hs -ddump-simpl -fforce-recomp -O0 -dsuppress-all
[1 of 1] Compiling Main ( SO.hs, SO.o )
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 32, types: 31, coercions: 0}
$dShow_rq3
$dShow_rq3 = $fShow[] $fShowChar
Rec {
rList_reI
rList_reI =
\ ds_dpV ->
case ds_dpV of _ {
[] -> return $fMonadIO ();
: x_ahp xs_ahq -> rList_reI xs_ahq
}
end Rec }
lst_rq4
lst_rq4 = enumFrom $fEnumInt (I# 1)
main
main =
>>=
$fMonadIO
(print $dShow_rq3 (unpackCString# "test"))
(\ _ -> rList_reI lst_rq4)
main
main = runMainIO main
There's suddenly a new top-level expression, namely lst_rq4
, which generates the list. And as seen before, GHCi retains the evaluations of top-level expressions, so lst_rq4
will also be retained.
Now there is an option to discard the evaluations:
Turning on +r
causes all evaluation of top-level expressions to be discarded after each evaluation (they are still retained during a single evaluation).
But since "they are still retained during a single evaluation" even :set +r
won't help you in this case. Unfortunately I cannot answer why GHC introduces a new top-level expression.
Why does this even happen in optimized code?
The list is still a top-level expression:
main2
main2 = eftInt 1 2147483647
Funny enough, GHC actually doesn't create an infinite list, since Int
is bounded.
How can one get rid of the leak?
In this case you can get rid of it if you place the list in test:
test = do
x <- print "test"
rList [1..]
This will prevent GHC from creating a top-level expression.
However, I can't really give a general advise on this. Unfortunately, my Haskell-fu isn't yet good enough.