For example
main = do
let ls = [0..10000000]
print ls
This will create the array 'inplace', using O(1) memory.
The following edit causes the program to run out of memory while executing.
main = do
let ls = [0..10000000]
print ls
print ls
ls
in this case must be kept in memory to be printed again. It would actually be heaps more memory efficient to recalculate the array again 'inplace' than to try to keep this in place. That's an aside though. My real question is "how and when does GHC communicate to the runtime system that ls
can be destroyed while it's created in O(1) time?" I understand that liveness analysis can find this information, I'm just wondering where the information is used. Is it the garbage collector that is passed this info? Is it somehow compiled away? (If I look at the compiled core from GHC then both examples use eftInt
, so if it's a compiler artifact then it must happen at a deeper level).
edit: My question was more about finding where this optimization took place. I thought maybe it was in the GC, which was fed some information from some liveness check in the compilation step. Due to the answers so far I'm probably wrong. This is most likely then happening at some lower level before core, so cmm perhaps?
edit2: Most of the answers here assume that the GC knows that ls
is no longer referenced in the first example, and that it is referenced again in the second example. I know the basics of GC and I know that arrays are linked lists, etc. My question is exactly HOW the GC knows this. The answer could probably be only (a) it is getting extra information from the compiler, or (b) it doesn't need to know this, that this information is handled 100% by the compiler
This explanation is probably a lie, especially because I'm making it up as I go, but that shouldn't be a problem.
The essential mistake you're making is assuming that a value is live if a variable bound to it is in scope in a live expression. This is simply wrong. A value bound to a variable is only live as a result if it is actually mentioned in a live expression.
The job of the runtime is very simple
main
.We can think of this execution as involving a couple different steps that repeat over and over:
So we start with some
main
expression. From the start, the "root set" for GC consists of those names that are used in thatmain
expression, not the things that are in scope in that expression. If I writethen since
foo
is not mentioned inmain
, it is not in the root set at the beginning, and since it is not even mentioned indirectly by anything mentioned bymain
, it is dead right from the start.Now suppose we take a more interesting example:
Now
foo
is mentioned inmain
, so it starts out live. We evaluatemain
to weak head normal form to find out what to do, and we get, approximately,Note that
foo
is no longer mentioned, so it is dead!Now we execute that primitive operation, printing "Hi!", and our "to do list" is reduced to
Now we evaluate that to WHNF, and get, roughly,
Now
bar
is dead. We print "Bye!" and exit.Consider, now, the first program you described:
This desugars to
This is where we start. The "root set" at the beginning is everything mentioned in the
in
clause of the expression. So we conceptually havels
andprint
to start out. Now we can imagine thatprint
, specialized to[Integer]
, looks something vaguely like the following (this is greatly simplified, and will print out the list differently, but that really doesn't matter).So when we start executing
main
(What do we do now? What will we do afterwards?), we are trying to calculateprint ls
. To do this, we pattern match on the first constructor ofls
, which forcesls
to be evaluated to WHNF. We find the second pattern,y:ys
, matches, so we replaceprint ls
withprint Integer y >> print ys
, wherey
points to the first element ofls
andys
points to the thunk representing the second list constructor ofls
. But note thatls
itself is now dead! Nothing is pointing to it! So asprint
forces bits of the list, the bits it has already passed become dead.In contrast, when you have
and you start executing, you start by calculating the thing to do first (
print ls
). You getEverything is the same, except that the second part of the expression now points to
ls
. So even though the first part will be dropping pieces of the list as it goes, the second part will keep holding on to the beginning, keeping it all live.Edit
Let me try explaining with something a little simpler than
IO
. Pretend that your program is an expression of type[Int]
, and the job of the runtime system is to print each element on its own line. So we can writeThe runtime system holds a value representing everything that it should print. Let's call the "current value"
whatPrint
. The RTS needs to follow a process:whatPrint
tomain
.whatPrint
empty? If so, I'm done, and can exit the program. If not, it is a cons,printNow : whatPrint'
.printNow
and print it.whatPrint
towhatPrint'
In this model, the "root set" for garbage collection is just
whatPrint
.In a real program, we don't produce a list; we produce an
IO
action. But such an action is also a lazy data structure (conceptually). You can think of>>=
,return
, and each primitiveIO
operation as a constructor forIO
. Think of it asThe initial value of
whatShouldIDo
ismain
, but its value evolves over time. Only what it points to directly is in the root set. There is no magical analysis necessary.ls
here is a lazy list, not an array. In practice, it's closer to a stream or generator in another language.The reason the first code works fine is that it never actually has the whole list in memory.
ls
is defined lazily and then consume element-by-element byprint
. Asprint
is going along, there are no other references to the beginning ofls
so list items can be garbage collected immediately.In theory, GHC could realize that it's more efficient to not store the list in memory between the two prints but instead recompute it. However, this is not always desirable—a lot of code is actually faster if things are only evaluated once—and, more importantly, would make the execution model even more confusing for programmers.