Haskell Space Leak

2019-04-01 00:37发布

问题:

all.

While trying to solve some programming quiz: https://www.hackerrank.com/challenges/missing-numbers , I came across with space leak.

Main function is difference, which implements multi-set difference. I've found out that List ':' and Triples (,,) kept on heaps with -hT option profiling. However, only big lists are difference's two arguments, and it shrinks as difference keeps on tail recursion. But the memory consumed by lists keeps increasing as program runs.

Triples is ephemeral array structure, used for bookkeeping the count of multiset's each element. But the memory consumed by triples also keeps increasing, and I cannot find out why.

Though I've browsed similar 'space leak' questions in stackoverflow, I couldn't grasp the idea. Surely I have much to study.

I appreciate any comments. Thank you.

p.s) executable is compiled with -O2 switch.

$ ./difference -hT < input04.txt
Stack space overflow: current size 8388608 bytes.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3

.

import Data.List
import Data.Array

-- array (non-zero-count, start-offset, array_data)
array_size=101

myindex :: Int -> Int -> Int
myindex key offset
    | key >= offset = key - offset
    | otherwise     = key - offset + array_size

mylookup x (_,offset,arr) = arr ! idx
    where idx = myindex x offset

addOrReplace :: Int -> Int -> (Int, Int, Array Int (Int,Int)) -> (Int, Int, Array Int (Int,Int))
addOrReplace key value (count,offset,arr) = (count', offset, arr // [(idx,(key,value))])
    where idx = myindex key offset
          (_,prev_value) = arr ! idx
          count' = case (prev_value, value) of
                      (0,0) -> count
                      (0,_) -> count + 1
                      (_,0) -> count - 1
                      otherwise -> count

difference :: (Int,Int,Array Int (Int,Int)) -> [Int] -> [Int] -> [Int]
difference (count,offset,arr) [] []
    | count == 0 = []
    | otherwise  = [ k | x <- [0..array_size-1], let (k,v) = (arr ! x), v /= 0]
difference m (x:xs) y = difference new_m xs y
    where (_,v) = mylookup x m
          new_m = addOrReplace x (v + 1) m
difference m [] (y:ys) = difference new_m [] ys
    where (_,v) = mylookup y m
          new_m = if v == 0
            then m
            else addOrReplace y (v - 1) m

main = do
    n <- readLn :: IO Int
    pp <- getLine
    m <- readLn :: IO Int
    qq <- getLine
    let p = map (read :: String->Int) . words $ pp
        q = map (read :: String->Int) . words $ qq
        startArray = (0,head q, array (0,100) [(i,(0,0)) | i <- [0..100]] )
    putStrLn . unwords . map show . sort $ difference startArray q p

[EDIT] I seq'ed value and Array thanks to Carl's advice. I attach heap diagram.

[original heap profiling] []1

[after seq'ing value v]

difference m (x:xs) y = difference new_m xs y
    where (_,v) = mylookup x m
          new_m = v `seq` addOrReplace x (v + 1) m

[after seq'ing value v and Array]

difference m (x:xs) y = new_m `seq` difference new_m xs y
    where (_,v) = mylookup x m
          new_m = v `seq` addOrReplace x (v + 1) m

回答1:

I see three main problems with this code.

First (and not the cause of the memory use, but definitely the cause of generally poor performance) Array is horrible for this use case. O(1) lookups are useless when updates are O(n).

Speaking of, the values being stored in the Array aren't forced while difference is looping over its first input. They are thunks containing pointers to an unevaluated lookup in the previous version of the array. You can ensure that the value is evaluated at the same time the array is updated, in a variety of ways. When difference loops over its second input, it does this accidentally, in fact, by comparing the value against 0.

Third, difference doesn't even force the evaluation of the new arrays being created while traversing its first argument. Nothing requires the old array to be evaluated during that portion of the loop.

Both of those latter issues need to be resolved to fix the space leak. The first issue doesn't cause a space leak, just much higher overheads than needed.