The following short Haskell program is intended to count a list of items from a file. The version using foldl'
works fine, but the version using the ST Monad
gives a stack space overflow message. Clearly there's a space leak of some sort here, but I haven't been able to solve it. The really interesting part is that the ST monad
is supposed to be doing in-place updates and shouldn't be letting resources grow like this, although that may only pertain to main memory and not stack space. Could someone explain what's happening here?
import Control.Monad
import Data.List
import Control.Monad.ST
import Data.STRef
--count items using foldl'
countFold :: Num a => [b] -> a
countFold = foldl' (\a _ -> a+1) 0
-- count items using the ST monad
-- derived fromt the sumST example on http://www.haskell.org/haskellwiki/Monad/ST
-- only using +1 instead of adding the values
countST :: Num a => [b] -> a
countST xs = runST $ do
n <- newSTRef 0
forM_ xs ( \_ -> modifySTRef n (+1) )
readSTRef n
main = do
mydata <- readFile "data_files/values_1000000.num"
let trainingdata = lines mydata
-- this works just fine
--(putStrLn (show (countFold trainingdata)))
-- This fails with the message:
-- Stack space overflow: current size 8388608 bytes.
-- Use `+RTS -Ksize -RTS' to increase it.
(putStrLn (show (countST trainingdata)))
UPDATE: Thanks for the answers and comments. I think I see what's happened here. modifySTRef' is new in version 4.6, which solves the problem nicely and includes the explanation someone mentioned. I'm using version 4.5 of Data.STRef, which appears to be standard in Ubuntu and includes neither the explanation or modifySTRef'.
Looking into the 4.6 package version and the function, the difference is that it uses seq to insure the function f is applied strictly (and stored in x'):
modifySTRef :: STRef s a -> (a -> a) -> ST s ()
modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref
modifySTRef' :: STRef s a -> (a -> a) -> ST s ()
modifySTRef' ref f = do
x <- readSTRef ref
let x' = f x
x' `seq` writeSTRef ref x'
So another way to solve it would have been to copy the function's code to a new name in my own program's space and apply seq to the leaking area, which is a nice general-purpose trick I'll probably use in the future. Thanks to everyone for helping me sort this out.