What is an easy way to explain how Garbage Collect

2019-04-28 05:46发布

问题:

I have a short attention span so I couldn't make my way through the Wikipedia article.

I understand there are several techniques for garbage collection but a common one is the "reachability" test, where an object's eligibility for collection is based on whether or not it can be "reached" by a rooted object (which, to my understanding is an object that is known to not require collection). When you want to know if a certain object is reachable, how would you go about it? How do you know where to look?

The collector obviously must be aware of all allocated objects and rooted objects. How does it determine the reachability of each of these objects?

回答1:

By walking the pointers/references, I'd say. In principle you just look whether an object still has references pointing to it (from other objects, local variables of currently executed code, ...). If it hasn't then no reference to this object can be obtained again (in languages like Java, at least where you can't do pointer trickery), so it's usually safe to throw that particular object away.

Other schemes used (or still in use) are for example reference counting where each object has a counter of references to it which must be incremented each time someone gets a reference to that object and decremented each time someone loses a reference to that object. COM in Windows works that way, if I remember correctly.

Java and .NET use (among others) generational garbage collection where each object is initially assumed to die very quickly (generational hypothesis). It then emplos some optimizations to keep garbage collection cycles fast and thus not disrupt the programs running too much. In ye olde times it wasn't uncommon for GC to lock up a program while it ran, sometimes for several seconds.

Aside from that, GC usually only runs when memory is low, i. e. too many dead objects have accumulated and need to be reclaimed. That's why most managed applications seem to waste much more memory compared to unmanaged ones, even though in many cases much of that memory can be reclaimed by running the GC once.



回答2:

A garbage collector will always know about every allocated object, since otherwise it cannot possibly detect unreachable objects for deletion. This is all taken care of during memory allocation. This makes sense, since if the garbage collector doesn't know about every object, then it would be possible to form orphaned sets of objects which cannot be reached, not even by the garbage collector. Basically, that would be a memory leak. So at the very least the garbage collector must be aware of every allocated object.

This leads to the question of how the garbage collector determines which objects are eligible for deletion. There are various techniques for this, but two that you will hear about a lot are "mark and sweep" and "reference counting" (they often come up in text books) and they are worth knowing about to understand the basic ideas of garbage collection.

Mark and sweep involves the garbage collector walking through the object references (starting at a known set of non-collectible objects) and marking each object that it can reach (setting a flag on the object for example). When it has exhausted all references, then the set of objects that are NOT marked, can be deleted during the sweep phase.

Reference counting involves the garbage collector keeping a count for each object in memory of how many other objects refer to it. This counter will be incremented each time a reference to the object is established from another object, and decremented when a reference is deleted. When the counter hits 0, the object is no longer being referenced by anything, and the garbage collector knows that it can be safely deleted (it has essentially become unreachable at this point - no object "knows about" this object anymore).



回答3:

When we create the object and assign that object equal to null that object is ready to be deleted by the garbage collector...



回答4:

A simple (but common) approach to garbage collection would be analogous to finding everything of value in a building and moving it to a new building, then blowing up the old building and replacing it with an empty one. To minimize the amount of object moving required, the system may use a few buildings of different sizes. All new objects go in a small building. When that's full, everything useful is copied to a medium building and the small building is blown up and replaced. If there's less than a small building's worth of stuff left in the medium building, everything useful will be moved to a big building, and then the medium building will then get blown up and replaced. If it seems like the large building has accumulated too much junk, everything useful will be copied to another large building and then the big building will be blown up and replaced. Although the small building will get blown up and replaced a lot, much of the stuff that's in it will be junk and won't need to be copied. Likewise, by the time the medium building needs to be replaced, much of the stuff in it will have been abandoned, so only a small percentage will need to be moved to the big building. Copying all the stuff that's in the big building will be expensive, but that operation won't have to happen so often.



回答5:

The collector must be aware of the location of object references within the global variables area, as well as in the activation frame of each function/method/procedure call, for each thread. The processor registers can also contain object references. Some information must be supplied either by a compiler, or by a virtual machine.



回答6:

Start with the root objects (i.e. those on the stack in your main() - equivalent) then just follow every pointer/reference and mark each object as 'reachable'. Continue until you've marked every object. The remaining objects (that haven't been marked) are garbage.

The other way is to have reference counting, which is garbage collection too, except that the marking is done at each pointer write and cleanup is istantaneous (as soon as an object is unreachable, it's ref count is zero, so it is destroyed). Note that ref-counting needs a mark-and-sweep phase (like above) to solve cycles. (I.e. objects that cling to each other but no-one else cares about)



回答7:

You can think of a running program (maybe I should be specific to Java) as being a number of Threads. The virtual machine can use the root frame of each thread as a root. It can then walk down the reachability-tree of everything that is referenced from that root (plus all stack frames under the root).

Anything else is not reachable