可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I've often read that in the Sun JVM short-lived objects ("relatively new objects") can be garbage collected more efficiently than long-lived objects ("relatively old objects")
- Why is that so?
- Is that specific to the Sun JVM or does this result from a general garbage collection principle?
回答1:
Most Java apps create Java objects and then discard them rather quickly eg. you create some objects in a method then once you exit the method all the object dies. Most apps behave this way and most people tend to code their apps this way. The Java heap is roughly broken up into 3 parts, permanent, old (long lived) generation, and young (short lived) generation. Young gen is further broken up into S1, S2 and eden. These are just heaps.
Most objects are created in the young gen. The idea here is that, since the mortality rate of objects is high, we quickly create them, use them and then discard them. Speed is of essence. As you create objects, the young gen fills up, until a minor GC occurs. In a minor GC, all objects that are alive are copied over from eden and say S2 to S1. Then, the 'pointer' is rested on eden and S2.
Every copy ages the object. By default, if an object survives 32 copies viz. 32 minor GC, then the GC figures that it is going to be around for a lot longer. So, what it does is to tenure it, by moving it to the old generation. Old gen is just one big space. When the old gen fills up, a full GC, or major GC, happens in the old gen. Because there is no other space to copy to, the GC has to compact. This is a lot slower than minor GC, that's why we avoid doing that more frequently.
You can tune the tenuring parameter with
java -XX:MaxTenuringThreshold=16
if you know that you have lots of long lived objects. You can print the various age bucket of your app with
java -XX:-PrintTenuringDistribution
回答2:
(see above explanations for more general GC.. this answers WHY new is cheaper to GC than old).
The reason eden can be cleared faster is simple: the algorithm is proportional to the number of objects that will survive GC in the eden space, not proportional to the number of live objects in the whole heap. ie: if you have an average object death rate of 99% in eden (ie: 99% of objects do not survive GC, which is not abnormal), you only need to look at and copy that 1%. For "old" GC, all live objects in the full heap need to be marked/swept. That is significantly more expensive.
回答3:
This is generational garbage collection. It's used pretty widely these days. See more here: (wiki).
Essentially, the GC assumes that new objects are more likely to become unreachable than older ones.
回答4:
There this phenomena that "most objects die young". Many objects are created inside a method and never stored in a field. Therefore, as soon as the method exits these objects "die" and thus are candidate for collection at the next collection cycle.
Here is an example:
public String concatenate(int[] arr) {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < arr.length; ++i)
sb.append(i > 0 ? "," : "").append(arr[i]);
return sb.toString();
}
The sb
object will become garbage as soon as the method returns.
By splitting the object space into two (or more) age-based areas the GC can be more efficient: instead of frequently scanning the entire heap, the GC frequently scans only the nursery (the young objects area) - which, obviously, takes much less time that a full heap scan. The older objects area is scanned less frequently.
回答5:
Young objects are managed more efficiently (not only collected; accesses to young objects are also faster) because they are allocated in a special area (the "young generation"). That special area is more efficient because it is collected "in one go" (with all threads stopped) and neither the collector nor the applicative code has to deal with concurrent access from the other.
The trade-off, here, is that the "world" is stopped when the "efficient area" is collected. This may induce a noticeable pause. The JVM keeps pause times low by keeping the efficient area small enough. In other words, if there is an efficiently-managed area, then that area must be small.
A very common heuristic, applicable to many programs and programming languages, is that many objects are very short-lived, and most of the write accesses occur in young objects (those which were created recently). It is possible to write application code which does not work that way, but these heuristic will be "mostly true" on "most applications". Thus, it makes sense to store young objects in the efficiently-managed area. Which is what the JVM GC does, and which is why that efficient area is called the "young generation".
Note that there are systems where the whole memory is handled "efficiently". When the GC must run, the application becomes "frozen" for a few seconds. This is harmless for long-run computations, but detrimental to interactivity, which is why most modern GC-enabled programming environments use generational GC with a limited-size young generation.
回答6:
This is based on the observation that the life-expectancy of an object goes up as it ages. So it makes sense to move objects to a less-frequently collected pool once they reach a certain age.
This isn't a fundamental property of the way programs use memory. You could write a pathological program that kept all objects around for a long time (and the same length of time for all objects), but this tends not to happen by accident.
回答7:
The JVM (usually) uses a generational garbage collector. This kind of collector separates the heap memory into several pools, according to the age of the objects in there. The reasoning here is based on the observation that most objects are short-lived, so that if you do a garbage collection on an area of memory with "young" objects, you can reclaim relatively more memory than if you do garbage collection across "older" objects.
In the Hotspot JVM, new objects get allocated in the so-called Eden area. When this area fills up, the JVM will sweep the Eden area (which does not take too much time, because it is not so big). Objects that are still alive are moved to the Survivor area, and the rest is discarded, freeing up Eden for the next generation. When the Eden collection is not sufficient does the the garbage collector move on to the older generations (which takes more work).
回答8:
All GCs behave that way. The basic idea is that you try to reduce the amount of objects that you need to check every time you run the GC because this is a pretty expensive operation. So if you have millions of objects but just need to check a few, that's way better than to have to check all of them. Also, a feature of GC plays into your hands: Temporary objects (which can't be reached by anyone anymore), have no cost during the GC run (well, let's ignore the finalize()
method for now). Only objects which survive cost CPU time. Next, there is the observation that many objects are short lived.
Therefore, objects are created in a small space (called "Eden" or "young gen"). After a while, all objects that can be reached are copied (= expensive) out of this space and the space is then declared empty (so Java effectively forgets about all unreachable objects, so they don't have a cost since they don't have to be copied). Over time, long lived objects are moved to "older" spaces and the older spaces are swept less often to reduce the GC overhead (for example, every N runs, the GC will run an old space instead of the eden space).
Just to compare: If you allocate an object in C/C++, you need to call free()
plus the destructor for each of them. This is one reason why GC is faster than traditional, manual memory management.
Of course, this is a rather simplified look. Today, working on GC is at the level of compiler design (i.e. done by very few people). GCs pull all kinds of tricks to make the whole process efficient and unnoticeable. See the Wikipedia article for some pointers.