I found the haskell wiki page on space leaks, which claims to list examples of real-world leaks, which it doesn't. It doesn't really say what a space leak is; it just links to the page for memory leaks.
What's a space leak?
I found the haskell wiki page on space leaks, which claims to list examples of real-world leaks, which it doesn't. It doesn't really say what a space leak is; it just links to the page for memory leaks.
What's a space leak?
As noted in @Rasko's answer, a space leak refers to a situation where a program or specific computation uses more (usually much more) memory than is necessary for the computation and/or expected by the programmer.
Haskell programs tend to be particularly susceptible to space leaks, mostly because of the lazy evaluation model (sometimes complicated by the way IO interacts with this model) and the highly abstract nature of the language which can make it difficult for a programmer to determine exactly how a particular computation is likely to be performed.
It helps to consider a specific example. This Haskell program:
is an idiomatic way to sum the first billion integers. Compiled with
-O2
, it runs in a few seconds in constant memory (a few megabytes, basically the runtime overhead).Now, any programmer would expect a program to sum the first billion integers should run without chewing up memory, but it's actually a little surprising that this Haskell version is well behaved. After all, read literally, it constructs a list of a billion integers before summing them up, so it ought to require at least a few gigabytes (just for storage for the billion integers, not to mention the overhead of a Haskell linked list).
However, lazy evaluation ensures that the list is only generated as it's needed and -- equally importantly -- optimizations performed by the compiler ensure that as list elements are added to the accumulating sum, the program recognizes they are no longer needed and allows them to be garbage collected instead of keeping them around until the end of the computation. So, at any point during the computation, only a sliding "window" into the middle of the list needs to be kept in memory -- earlier elements have been discarded, and later elements are yet to be lazily computed. (In fact, the optimizations go further than this: no list is even constructed, but this is far from obvious to the programmer.)
Soooo... Haskell programmers get used to the idea that tossing around giant (or even infinite) data structures will "just work" with computations automatically using only the memory they need.
But, a minor change to the program, like also printing the length of the list as proof of all the hard work we are doing:
suddenly causes space usage to explode to dozens of gigabytes (or in the case of my laptop, to about 13Gigs before it starts swapping hopelessly and I kill it).
This is a space leak. Calculating the sum and length of this list are obviously things that can be done in constant space using a "sliding window" view into the list, but the above program uses much more memory than needed. The reason, it turns out, is that once the list has been given a name
vals
that's used in two places, the compiler no longer allows the "used" elements to be immediately discarded. If thesum vals
is evaluated first, the list is lazily generated and summed, but the entire, giant list is then kept around untillength vals
can be evaluated.As a more practical example, you might write a simple program to count words and characters in a file:
This works fine on small test files up to a couple megabytes, but it's noticeably sluggish on 10meg file, and if you try to run it on a 100meg file, it'll slowly but surely start gobbling up all available memory. Again, the problem is that -- even though the file contents are read lazily into
txt
-- becausetxt
is used twice, the entire contents are read into memory as a HaskellString
type (a memory-inefficient representation of large blocks of text) when, say,length txt
is evaluated, and none of that memory can be freed untillength (words txt)
has also been computed.Note that:
and:
both run quickly in constant space even on big files.
As a side note, fixing the above space leak normally involves rewriting the computation so the characters and words are counted with one pass through the contents, so the compiler can determine that the contents of the file that have already been processed do not need to be kept around in memory until the end of the computation. One possible solution is:
The complexity of this solution (use of bang (
!
) patterns and an explicit fold instead oflength
andwords
) illustrates how tough space leaks can be, especially for new Haskell programmers. And it's not at all obvious that usingfoldl'
instead offoldl
makes no difference (but usingfoldr
orfoldr'
would be a disaster!), that the bangs beforecs'
andws'
are critical to avoid a space leak, but that the bang beforeinWord'
isn't (though it slightly improves performance), etc.