I want to learn the theory behind garbage collection. How do i go about it? The obvious answer is - a compiler textbook... The question is, is it necessary to learn lexical analysis, parsing and other stuff that usually precedes garbage collection in a text?
In short, what are the prerequisites to learning about Garbage collection theory?
P.S - I do know what is the purpose of parsing, lexical analysis etc. Just not how they are implemented.
Read these papers in order. They are in progressive subject matter/difficulty order (not chronological).
List taken directly from Prof. Kathryn McKinley's Memory Management course page here, where you'll find links to all the articles.
I took the course last semester, so I read all these and I have to say I learned what I set out to learn!
There is a whole book on garbage collection, and a quite good one, if I may add:
Richard Jones also maintains a nice site collecting garbage collection resources.
Most early garbage collection papers are eminently readable. You could start with Paul Wilson's survey of "Uniprocessor Garbage Collection Techniques" (1992, LNCS vol. 637) and then dive into the original literature on topics that sound interesting.
You might also want to take a look at Squeak: Open Personal Computing, which covers the Squeak Smalltalk garbage collector, among other ST design issues. You should also take a look at Squeak itself - it is almost completely written in Smalltalk, and all the source, including the GC, is freely available and easy to study using the Smalltalk browsers.
I am also a dabbler interested in garbage collection (to the point that I wrote my own garbage collected VM called HLVM). I learned by reading as many research papers on garbage collection as I could get my hands on and by playing with the ideas myself, both raw in my virtual machine and also by writing memory-safe high-level simulations.
The lexical analysis, parsing and other stuff is not relevant to garbage collection. You might get an outdated cursory overview of garbage collection from a compiler book but you need to read research papers to get an up-to-date view, e.g. with regard to multicore.
You need to know about basic graph theory, pointers, stacks, threads and (if you're interested in multi-threading) low-level concurrency primitives such as locks.
Garbage collection is all about determining reachability. When a program can no longer obtain a reference to a value because that value has become unreachable then the GC can recycle the memory that value is occupying. Reachability is determined by traversing the heap starting from a set of "global roots" (global variables and pointers on the thread's stacks and in the core's registers)
GC design has many facets but you might begin with the four main garbage collection algorithms:
Perhaps the most notable evolution of these basic ideas is generational garbage collection, which was the defacto standard design for many years.
My personal feeling is that some of the obscure work on garbage collection conveys just as much useful information so I'd also highly recommend:
You might also like to study the three kinds of write barrier (Dijkstra's, Steele's and Yuasa's) and look at the card marking and remembered set techniques.
Then you might also like to examine the actual design decisions some implementors chose for language implementations like Java and .NET as well as the SML/NJ compiler for Standard ML, the OCaml compiler, the Glasgow Haskell compiler and others. The differences between the techniques adopted are as big as the similarities between them!
There are also some great tangentially-related papers like Henderson's Accurate Garbage Collection in an Uncooperative Environment. I used that technique to avoid having to write a stack walker for HLVM.
The memorymanagement.org website is an invaluable resource, particularly the glossary of GC-related terms.
Chapter 9, "Memory management", of Object Oriented Software Construction, 2nd Edition by Bertrand Meyer is rather informative.
I don't know any text book on compilers that also explains garbage collections, since, as you've said yourself, the two are quite unrelated.
Actually, I really like the Wikipedia article as an introductory explanation with many good pointers. Definitely one of the better CS articles on Wikipedia.