Tuning garbage collections for low latency

2019-01-21 17:48发布

I'm looking for arguments as to how best to size the young generation (with respect to the old generation) in an environment where low latency is critical.

My own testing tends to show that latency is lowest when the young generation is fairly large (eg. -XX:NewRatio <3), however I cannot reconcile this with the intuition that the larger the young generation the more time it should take to garbage collect.

The application runs on linux 64 bits, jdk 6.

Memory usage is about 50Megabytes of long-lived objects being loaded at startup (=data cache), and from there it's only (many) very short lived objects being created (with average lifespan < 1 milliseconds).

Some garbage collection cycle take more than 10 milliseconds to run... which looks really disproportionate compared with app latency, which is again a few millisecs at max.

3条回答
家丑人穷心不美
2楼-- · 2019-01-21 17:58

For an application that generates lots of short lived garbage and nothing long lived then one approach that can work is a big heap with nearly all of it young gen and nearly all of that eden and tenure anything that survives a YG collection more than once.

For example (lets say you had a 32bit jvm)

  • 3072M heap (Xms and Xmn)
  • 128M tenured (i.e. Xmn 2944m)
  • MaxTenuringThreshold=1
  • SurvivorRatio=190 (i.e. each survivor space is 1/192 of the YG)
  • TargetSurvivorRatio=90 (i.e. fill those survivors as much as possible)

The exact params you would use for this setup depend on what the steady state size of your working set is (i.e. how much is alive at the time of each collection). The thinking here obviously goes against the normal heap sizing rules but then you don't have an app that behaves in that way. The thinking is that the app is mostly v short lived garbage and a bit of static data so set the jvm up so that that static data gets into tenured quickly and then have a YG big enough that it doesn't get collected v often thus minimising the frequency of the pauses. You'd need to twiddle knobs repeatedly to work out what a good size is for you & how that balances against the size of the pause you get per collection. You might find shorter but more frequent YG pauses are achieveable for example.

You don't say how long your app runs for but the target here is to have no tenured collections at all for the life of the app. This may be impossible of course but it's worth aiming for.

However it's not just the collection algo that is important in your case, it is where the memory is allocated. The NUMA collector (only compatible with the throughput collector and activated with UseNUMA switch) makes use of the observation that an object is often uses purely by the thread that created it & thus allocates memory accordingly. I'm not sure what it is based on in linux but it uses MPO (memory placement optimisation) on Solaris, some details on one of the GC guys blogs

Since you're using 64bit jvm then make sure you're using CompressedOops as well.

Given that rate of object allocation (possibly some sort of science lib?) and lifetime then you should give some consideration to object reuse. One example of a lib doing this is the javalution StackContext

Finally it's worth noting that GC pauses are not the only STW pauses, you could run with the 6u21 early access build which has some fixes to the PrintGCApplicationStoppedTime and PrintGCApplicationConcurrentTime switches (that effectively print time at a global safepoint and time between those safepoints). You can use the tracesafepointstatistics flag to get some idea of what is causing it to need a safepoint (aka no byte code is being executed by any thread).

查看更多
手持菜刀,她持情操
3楼-- · 2019-01-21 18:00

When attempting realtime applications with Java, garbage collection tuning is essential but there are also other aspects you need to think about (e.g. the JIT compiler, timers, threading, asynchronous event handling).

Since there seems to be a demand for realtime Java, Sun provides a Java Real-Time System specification and has a commercial implementation available. You can find more information here.

查看更多
▲ chillily
4楼-- · 2019-01-21 18:01

Have you already enabled more relevant GC settings, like selecting a concurrent low-pause collector algorithm?

Broadly, the young, tenured and permanent generations need to be sized to match your application's profile. If you have many short-lived objects but young is too small, lots of objects will become tenured, forcing more frequent major collections of the entire tenured generation. Likewise if young is too large, then tenured is necessarily smaller, and might force frequent major collections of tenured.

Practically speaking, I think you will find that the time spent in minor vs. major collections trades off as you increase the size of the young generation, and is optimal at some point.

Maybe it's helpful to note that in "big" performance-sensitive server applications, I've found it necessary to shrink the young generation, in general. This is because such applications ought to have been profiled already for memory allocation hotspots and optimized, so they're producing few short-lived objects. This in turn means the young generation is hogging too much of the heap.

So I suppose I'd do that optimization first, then look at turning up NewRatio beyond 8, and watching the output given by -verbose:gc to see how GC and Full GC time trades off and where it's optimal.

查看更多
登录 后发表回答