I have a short attention span so I couldn't make my way through the Wikipedia article.
I understand there are several techniques for garbage collection but a common one is the "reachability" test, where an object's eligibility for collection is based on whether or not it can be "reached" by a rooted object (which, to my understanding is an object that is known to not require collection). When you want to know if a certain object is reachable, how would you go about it? How do you know where to look?
The collector obviously must be aware of all allocated objects and rooted objects. How does it determine the reachability of each of these objects?
By walking the pointers/references, I'd say. In principle you just look whether an object still has references pointing to it (from other objects, local variables of currently executed code, ...). If it hasn't then no reference to this object can be obtained again (in languages like Java, at least where you can't do pointer trickery), so it's usually safe to throw that particular object away.
Other schemes used (or still in use) are for example reference counting where each object has a counter of references to it which must be incremented each time someone gets a reference to that object and decremented each time someone loses a reference to that object. COM in Windows works that way, if I remember correctly.
Java and .NET use (among others) generational garbage collection where each object is initially assumed to die very quickly (generational hypothesis). It then emplos some optimizations to keep garbage collection cycles fast and thus not disrupt the programs running too much. In ye olde times it wasn't uncommon for GC to lock up a program while it ran, sometimes for several seconds.
Aside from that, GC usually only runs when memory is low, i. e. too many dead objects have accumulated and need to be reclaimed. That's why most managed applications seem to waste much more memory compared to unmanaged ones, even though in many cases much of that memory can be reclaimed by running the GC once.
When we create the object and assign that object equal to null that object is ready to be deleted by the garbage collector...
The collector must be aware of the location of object references within the global variables area, as well as in the activation frame of each function/method/procedure call, for each thread. The processor registers can also contain object references. Some information must be supplied either by a compiler, or by a virtual machine.
A garbage collector will always know about every allocated object, since otherwise it cannot possibly detect unreachable objects for deletion. This is all taken care of during memory allocation. This makes sense, since if the garbage collector doesn't know about every object, then it would be possible to form orphaned sets of objects which cannot be reached, not even by the garbage collector. Basically, that would be a memory leak. So at the very least the garbage collector must be aware of every allocated object.
This leads to the question of how the garbage collector determines which objects are eligible for deletion. There are various techniques for this, but two that you will hear about a lot are "mark and sweep" and "reference counting" (they often come up in text books) and they are worth knowing about to understand the basic ideas of garbage collection.
Mark and sweep involves the garbage collector walking through the object references (starting at a known set of non-collectible objects) and marking each object that it can reach (setting a flag on the object for example). When it has exhausted all references, then the set of objects that are NOT marked, can be deleted during the sweep phase.
Reference counting involves the garbage collector keeping a count for each object in memory of how many other objects refer to it. This counter will be incremented each time a reference to the object is established from another object, and decremented when a reference is deleted. When the counter hits 0, the object is no longer being referenced by anything, and the garbage collector knows that it can be safely deleted (it has essentially become unreachable at this point - no object "knows about" this object anymore).
Start with the root objects (i.e. those on the stack in your main() - equivalent) then just follow every pointer/reference and mark each object as 'reachable'. Continue until you've marked every object. The remaining objects (that haven't been marked) are garbage.
The other way is to have reference counting, which is garbage collection too, except that the marking is done at each pointer write and cleanup is istantaneous (as soon as an object is unreachable, it's ref count is zero, so it is destroyed). Note that ref-counting needs a mark-and-sweep phase (like above) to solve cycles. (I.e. objects that cling to each other but no-one else cares about)
You can think of a running program (maybe I should be specific to Java) as being a number of
Threads
. The virtual machine can use the root frame of each thread as aroot
. It can then walk down the reachability-tree of everything that is referenced from that root (plus all stack frames under the root).Anything else is not reachable