Why does performGC fail to release all memory?

2019-03-11 01:47发布

问题:

Given the program:

import Language.Haskell.Exts.Annotated -- from haskell-src-exts
import System.Mem
import System.IO
import Control.Exception

main :: IO ()
main = do
  evaluate $ length $ show $ fromParseResult $ parseFileContents $ "data C = C {a :: F {- " ++ replicate 400000 'd' ++ " -}     }"
  performGC
  performGC
  performGC

Using GHC 7.0.3, when I run:

$ ghc --make Temp.hs -rtsopts && Temp.exe +RTS -G1 -S
    Alloc    Copied     Live    GC    GC     TOT     TOT  Page Flts
    bytes     bytes     bytes  user  elap    user    elap
 ...
 29463264        64   8380480  0.00  0.00    0.64    0.85    0    0  (Gen:  0)
       20        56   8380472  0.00  0.00    0.64    0.86    0    0  (Gen:  0)
        0        56   8380472  0.00  0.00    0.64    0.87    0    0  (Gen:  0)
    42256       780     33452  0.00  0.00    0.64    0.88    0    0  (Gen:  0)
        0                      0.00  0.00

The performGC call seems to leave 8Mb of memory live, even though it seems like all the memory should be dead. How come?

(Without -G1 I see 10Mb live at the end, which I also can't explain.)

回答1:

Here's what I see (after inserting a print before the last performGC, to help tag when things happen.

   524288    524296  32381000  0.00  0.00    1.15    1.95    0    0  (Gen:  0)
   524288    524296  31856824  0.00  0.00    1.16    1.96    0    0  (Gen:  0)
   368248       808   1032992  0.00  0.02    1.16    1.99    0    0  (Gen:  1)
        0       808   1032992  0.00  0.00    1.16    1.99    0    0  (Gen:  1)
"performed!"
    39464      2200   1058952  0.00  0.00    1.16    1.99    0    0  (Gen:  1)
    22264      1560   1075992  0.00  0.00    1.16    2.00    0    0  (Gen:  0)
        0                      0.00  0.00

So after GCs there is still 1M on the heap (without -G1). With -G1 I see:

 34340656  20520040  20524800  0.10  0.12    0.76    0.85    0    0  (Gen:  0)
 41697072  24917800  24922560  0.12  0.14    0.91    1.01    0    0  (Gen:  0)
 70790776       800   2081568  0.00  0.02    1.04    1.20    0    0  (Gen:  0)
        0       800   2081568  0.00  0.00    1.04    1.20    0    0  (Gen:  0)
"performed!"
    39464      2184   1058952  0.00  0.00    1.05    1.21    0    0  (Gen:  0)
    22264      2856     43784  0.00  0.00    1.05    1.21    0    0  (Gen:  0)
        0                      0.00  0.00

So about 2M. This is on x86_64/Linux.

Let's think about the STG machine storage model to see if there's something else on the heap.

Things that could be in that 1M of space:

  • CAFs for things like [], string constants, and the small Int and Char pool, plus things in libraries, the stdin MVar?
  • Thread State Objects (TSOs) for the main thread.
  • Any allocated signal handlers.
  • The IO manager Haskell code.
  • Sparks in the spark pool

From experience, this figure of slightly less than 1M seems to be the default "footprint" of a GHC binary. That's about what I've seen in other programs as well (e.g. shootout program smallest footprints are never less than 900K).

Perhaps the profiler can say something. Here's the -hT profile (no profiling libs needed), after I insert a minimal busy loop at the end to string out the tail:

 $ ./A +RTS -K10M -S -hT -i0.001    

Results in this graph:



Victory! Look at that ~1M thread stack object sitting there!

I don't know of a way to make TSOs smaller.


The code that produced the above graph:

import Language.Haskell.Exts.Annotated -- from haskell-src-exts
import System.Mem
import System.IO
import Data.Int
import Control.Exception

main :: IO ()
main = do
  evaluate $ length $ show $ fromParseResult 
           $ parseFileContents 
           $ "data C = C {a :: F {- " ++ replicate 400000 'd' ++ " -}     }"
  performGC
  performGC
  print "performed!"
  performGC

  -- busy loop so we can sample what's left on the heap.
  let go :: Int32 -> IO ()
      go  0 = return ()
      go  n = go $! n-1
  go (maxBound :: Int32)


回答2:

Compiling the code with -O -ddump-simpl, I see the following global definition in the simplifier output:

lvl2_r12F :: [GHC.Types.Char]
[GblId]
lvl2_r12F =
  GHC.Base.unpackAppendCString# "data C = C {a :: F {- " lvl1_r12D

The input to the parser function has become a global string constant. Globals are never garbage collected in GHC, so that's probably what's occupying the 8MB of memory after garbage colleciton.