I'm having a hard time dealing with a Java garbage collection problem, and interpreting the logs.
My application requires that no GC takes longer than 2 seconds, and ideally less than 100ms.
Based on some previous advice I'm trying the following command line options:
java -XX:MaxGCPauseMillis=100 -XX:NewRatio=9 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -server -Xmx9g -Xms9g
The application has a large amount of long-term stored objects, which are kept in a ConcurrentLinkedHashMap. I'm seeing occasional long pauses, in the worst case up to 10 seconds (this is the second last like of the GC logs below)!
Here is some of the output I'm getting:
16938.968: [GC 16938.968: [ParNew: 153343K->17022K(153344K), 7.8608580 secs] 6184328K->6122510K(9420160K) icms_dc=7 , 7.8614100 secs] [Times: user=0.63 sys=0.01, real=7.86 secs]
16947.087: [GC 16947.087: [ParNew: 153342K->17022K(153344K), 7.2604030 secs] 6258830K->6198642K(9420160K) icms_dc=7 , 7.2609780 secs] [Times: user=0.44 sys=0.00, real=7.27 secs]
16954.614: [GC 16954.614: [ParNew: 153342K->17024K(153344K), 8.4307620 secs] 6334962K->6274625K(9420160K) icms_dc=7 , 8.4313150 secs] [Times: user=0.62 sys=0.01, real=8.43 secs]
16963.310: [GC 16963.310: [ParNew: 153344K->17023K(153344K), 6.2588760 secs] 6410945K->6350748K(9420160K) icms_dc=7 , 6.2594290 secs] [Times: user=0.48 sys=0.01, real=6.25 secs]
16969.834: [GC 16969.834: [ParNew: 153343K->17022K(153344K), 6.0274280 secs] 6487068K->6425868K(9420160K) icms_dc=7 , 6.0279830 secs] [Times: user=0.50 sys=0.01, real=6.03 secs]
16976.122: [GC 16976.123: [ParNew: 153342K->17022K(153344K), 11.7774620 secs] 6562188K->6503030K(9420160K) icms_dc=7 , 11.7780180 secs] [Times: user=0.43 sys=0.04, real=11.78 secs]
16988.164: [GC 16988.164: [ParNew: 153342K->17024K(153344K), 10.9477920 secs] 6639350K->6579928K(9420160K) icms_dc=7 , 10.9483440 secs] [Times: user=0.37 sys=0.02, real=10.95 secs]
16999.371: [GC 16999.372: [ParNew: 153344K->17023K(153344K), 9.8828360 secs] 6716248K->6655886K(9420160K) icms_dc=7 , 9.8833940 secs] [Times: user=0.42 sys=0.01, real=9.88 secs]
17009.509: [GC 17009.509: [ParNew: 153343K->17023K(153344K), 5.0699960 secs] 6792206K->6727987K(9420160K) icms_dc=7 , 5.0705660 secs] [Times: user=0.55 sys=0.01, real=5.07 secs]
17014.838: [GC 17014.838: [ParNew: 153343K->17023K(153344K), 6.6411750 secs] 6864307K->6790974K(9420160K) icms_dc=7 , 6.6417400 secs] [Times: user=0.37 sys=0.01, real=6.63 secs]
17021.735: [GC 17021.735: [ParNew: 153343K->17024K(153344K), 8.0545970 secs] 6927294K->6856409K(9420160K) icms_dc=7 , 8.0551790 secs] [Times: user=0.34 sys=0.03, real=8.05 secs]
17030.052: [GC 17030.053: [ParNew: 153344K->17023K(153344K), 7.9756730 secs] 6992729K->6922569K(9420160K) icms_dc=7 , 7.9762530 secs] [Times: user=0.34 sys=0.01, real=7.98 secs]
17038.398: [GC 17038.398: [ParNew: 153343K->17022K(153344K), 12.9613300 secs] 7058889K->6990725K(9420160K) icms_dc=7 , 12.9618850 secs] [Times: user=0.39 sys=0.01, real=12.96 secs]
17051.630: [GC 17051.630: [ParNew: 153342K->17022K(153344K), 6.8942910 secs] 7127045K->7059607K(9420160K) icms_dc=7 , 6.8948380 secs] [Times: user=0.56 sys=0.02, real=6.89 secs]
17058.798: [GC 17058.798: [ParNew: 153342K->17024K(153344K), 10.0262190 secs] 7195927K->7126351K(9420160K) icms_dc=7 , 10.0267860 secs] [Times: user=0.37 sys=0.01, real=10.02 secs]
17069.096: [GC 17069.096: [ParNew: 153344K->17023K(153344K), 10.0419500 secs] 7262671K->7195002K(9420160K) icms_dc=7 , 10.0425020 secs] [Times: user=0.40 sys=0.02, real=10.04 secs]
17079.410: [GC 17079.410: [ParNew: 153343K->17022K(153344K), 13.5389040 secs] 7331322K->7264275K(9420160K) icms_dc=7 , 13.5394610 secs] [Times: user=0.30 sys=0.01, real=13.54 secs]
17093.223: [GC 17093.224: [ParNew: 153342K->17023K(153344K), 10.5909450 secs] 7400595K->7330446K(9420160K) icms_dc=7 , 10.5915060 secs] [Times: user=0.33 sys=0.00, real=10.58 secs]
17104.083: [GC 17104.084: [ParNew: 153343K->17024K(153344K), 5.8420210 secs] 7466766K->7392173K(9420160K) icms_dc=7 , 5.8425920 secs] [Times: user=0.57 sys=0.00, real=5.84 secs]
I've spent hours pouring over the various webpages that describe Java GC tuning, but none have really given me the ability to interpret the logs above and come up with a course of action. Any specific advice based on the logs I've provided would be greatly appreciated.
Update: Per a question below:
The machine has 16G of RAM, here is the info from top: Mem: 15483904k total, 15280084k used, 203820k free, 155684k buffers Swap: 2031608k total, 1347240k used, 684368k free, 3304044k cached
Its a different run, but here is the current top output for the process:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1016 sanity 17 0 10.2g 6.5g 9464 S 1 44.2 10:24.32 java
Update 2: Some more detailed logging, this looked like it caused a 400ms delay:
{Heap before GC invocations=1331 (full 1):
par new generation total 153344K, used 153343K [0x00002aaaae200000, 0x00002aaab8860000, 0x00002aaab8860000)
eden space 136320K, 100% used [0x00002aaaae200000, 0x00002aaab6720000, 0x00002aaab6720000)
from space 17024K, 99% used [0x00002aaab77c0000, 0x00002aaab885fff0, 0x00002aaab8860000)
to space 17024K, 0% used [0x00002aaab6720000, 0x00002aaab6720000, 0x00002aaab77c0000)
concurrent mark-sweep generation total 7169664K, used 4258496K [0x00002aaab8860000, 0x00002aac6e200000, 0x00002aac6e200000)
concurrent-mark-sweep perm gen total 21248K, used 13269K [0x00002aac6e200000, 0x00002aac6f6c0000, 0x00002aac73600000)
484.738: [GC 484.738: [ParNew: 153343K->17022K(153344K), 0.3950480 secs] 4411840K->4341689K(7323008K), 0.3954820 secs] [Times: user=0.49 sys=0.07, real=0.40 secs]
Heap after GC invocations=1332 (full 1):
par new generation total 153344K, used 17022K [0x00002aaaae200000, 0x00002aaab8860000, 0x00002aaab8860000)
eden space 136320K, 0% used [0x00002aaaae200000, 0x00002aaaae200000, 0x00002aaab6720000)
from space 17024K, 99% used [0x00002aaab6720000, 0x00002aaab77bfb68, 0x00002aaab77c0000)
to space 17024K, 0% used [0x00002aaab77c0000, 0x00002aaab77c0000, 0x00002aaab8860000)
concurrent mark-sweep generation total 7169664K, used 4324666K [0x00002aaab8860000, 0x00002aac6e200000, 0x00002aac6e200000)
concurrent-mark-sweep perm gen total 21248K, used 13269K [0x00002aac6e200000, 0x00002aac6f6c0000, 0x00002aac73600000)
}