I've often read that in the Sun JVM short-lived objects ("relatively new objects") can be garbage collected more efficiently than long-lived objects ("relatively old objects")
- Why is that so?
- Is that specific to the Sun JVM or does this result from a general garbage collection principle?
This is generational garbage collection. It's used pretty widely these days. See more here: (wiki).
Essentially, the GC assumes that new objects are more likely to become unreachable than older ones.
All GCs behave that way. The basic idea is that you try to reduce the amount of objects that you need to check every time you run the GC because this is a pretty expensive operation. So if you have millions of objects but just need to check a few, that's way better than to have to check all of them. Also, a feature of GC plays into your hands: Temporary objects (which can't be reached by anyone anymore), have no cost during the GC run (well, let's ignore the
finalize()
method for now). Only objects which survive cost CPU time. Next, there is the observation that many objects are short lived.Therefore, objects are created in a small space (called "Eden" or "young gen"). After a while, all objects that can be reached are copied (= expensive) out of this space and the space is then declared empty (so Java effectively forgets about all unreachable objects, so they don't have a cost since they don't have to be copied). Over time, long lived objects are moved to "older" spaces and the older spaces are swept less often to reduce the GC overhead (for example, every N runs, the GC will run an old space instead of the eden space).
Just to compare: If you allocate an object in C/C++, you need to call
free()
plus the destructor for each of them. This is one reason why GC is faster than traditional, manual memory management.Of course, this is a rather simplified look. Today, working on GC is at the level of compiler design (i.e. done by very few people). GCs pull all kinds of tricks to make the whole process efficient and unnoticeable. See the Wikipedia article for some pointers.
The JVM (usually) uses a generational garbage collector. This kind of collector separates the heap memory into several pools, according to the age of the objects in there. The reasoning here is based on the observation that most objects are short-lived, so that if you do a garbage collection on an area of memory with "young" objects, you can reclaim relatively more memory than if you do garbage collection across "older" objects.
In the Hotspot JVM, new objects get allocated in the so-called Eden area. When this area fills up, the JVM will sweep the Eden area (which does not take too much time, because it is not so big). Objects that are still alive are moved to the Survivor area, and the rest is discarded, freeing up Eden for the next generation. When the Eden collection is not sufficient does the the garbage collector move on to the older generations (which takes more work).
Young objects are managed more efficiently (not only collected; accesses to young objects are also faster) because they are allocated in a special area (the "young generation"). That special area is more efficient because it is collected "in one go" (with all threads stopped) and neither the collector nor the applicative code has to deal with concurrent access from the other.
The trade-off, here, is that the "world" is stopped when the "efficient area" is collected. This may induce a noticeable pause. The JVM keeps pause times low by keeping the efficient area small enough. In other words, if there is an efficiently-managed area, then that area must be small.
A very common heuristic, applicable to many programs and programming languages, is that many objects are very short-lived, and most of the write accesses occur in young objects (those which were created recently). It is possible to write application code which does not work that way, but these heuristic will be "mostly true" on "most applications". Thus, it makes sense to store young objects in the efficiently-managed area. Which is what the JVM GC does, and which is why that efficient area is called the "young generation".
Note that there are systems where the whole memory is handled "efficiently". When the GC must run, the application becomes "frozen" for a few seconds. This is harmless for long-run computations, but detrimental to interactivity, which is why most modern GC-enabled programming environments use generational GC with a limited-size young generation.
There this phenomena that "most objects die young". Many objects are created inside a method and never stored in a field. Therefore, as soon as the method exits these objects "die" and thus are candidate for collection at the next collection cycle.
Here is an example:
The
sb
object will become garbage as soon as the method returns.By splitting the object space into two (or more) age-based areas the GC can be more efficient: instead of frequently scanning the entire heap, the GC frequently scans only the nursery (the young objects area) - which, obviously, takes much less time that a full heap scan. The older objects area is scanned less frequently.
This is based on the observation that the life-expectancy of an object goes up as it ages. So it makes sense to move objects to a less-frequently collected pool once they reach a certain age.
This isn't a fundamental property of the way programs use memory. You could write a pathological program that kept all objects around for a long time (and the same length of time for all objects), but this tends not to happen by accident.