Successive CMS collections using ConcMarkSweepGC

2019-05-14 05:33发布

问题:

Starting Tomcat with the following JVM options on Linux (2cpu, 4g RAM, 800M ehcache element):

-Xmn512m
-Xmx2g
-Xms2g
-XX:+UseConcMarkSweepGC
-Xss128k
-XX:PermSize=64m
-XX:MaxPermSize=128m
-XX:+DisableExplicitGC
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails

Running with low traffic, no db problems and no batch process in the early morning (06:00) for 20 minutes before heap memory reached -Xmx value and CMS ran repeatedly before eventually returning to normal GC behavior at which time the message Unloading class sun.reflect.GeneratedMethodAccessorXXXXX was printed to the log.

Please could you:

  1. Suggest some GC tuning points? or
  2. Suggest a possible cause?

GC log:

    2011-06-30T06:15:03.494+0900: 145634.229: [GC 145634.230: [ParNew: 463698K->37791K(471872K), 0.0342030 secs] 1848117K->1434326K(2044736K), 0.0343190 secs] [Times: user=0.13 sys=0.01, real=0.03 secs] 
2011-06-30T06:15:12.423+0900: 145643.159: [GC 145643.159: [ParNew: 457247K->36210K(471872K), 0.0302130 secs] 1853782K->1443540K(2044736K), 0.0303470 secs] [Times: user=0.12 sys=0.00, real=0.03 secs] 
2011-06-30T06:15:29.984+0900: 145660.720: [GC 145660.720: [ParNew: 455534K->50720K(471872K), 0.0314960 secs] 1862864K->1458051K(2044736K), 0.0316210 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 
2011-06-30T06:15:46.756+0900: 145677.492: [GC 145677.492: [ParNew: 470176K->52416K(471872K), 0.0446300 secs] 1877507K->1477425K(2044736K), 0.0447580 secs] [Times: user=0.17 sys=0.01, real=0.05 secs] 
2011-06-30T06:16:06.400+0900: 145697.135: [GC 145697.136: [ParNew: 471872K->35652K(471872K), 0.0451560 secs] 1896881K->1480786K(2044736K), 0.0452730 secs] [Times: user=0.16 sys=0.00, real=0.04 secs] 
2011-06-30T06:16:23.468+0900: 145714.204: [GC 145714.204: [ParNew: 455108K->38442K(471872K), 0.0356610 secs] 1900242K->1498930K(2044736K), 0.0357770 secs] [Times: user=0.13 sys=0.00, real=0.04 secs] 
2011-06-30T06:16:23.505+0900: 145714.240: [GC [1 CMS-initial-mark: 1460487K(1572864K)] 1499562K(2044736K), 0.0099200 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2011-06-30T06:16:23.515+0900: 145714.251: [CMS-concurrent-mark-start]
2011-06-30T06:16:24.768+0900: 145715.504: [CMS-concurrent-mark: 1.253/1.253 secs] [Times: user=1.31 sys=0.00, real=1.25 secs] 
2011-06-30T06:16:24.768+0900: 145715.504: [CMS-concurrent-preclean-start]
2011-06-30T06:16:24.775+0900: 145715.511: [CMS-concurrent-preclean: 0.007/0.007 secs] [Times: user=0.00 sys=0.01, real=0.01 secs] 
2011-06-30T06:16:24.775+0900: 145715.511: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2011-06-30T06:16:29.811+0900: 145720.547: [CMS-concurrent-abortable-preclean: 0.760/5.036 secs] [Times: user=1.02 sys=0.02, real=5.03 secs] 
2011-06-30T06:16:29.812+0900: 145720.547: [GC[YG occupancy: 197582 K (471872 K)]145720.547: [Rescan (parallel) , 0.0291550 secs]145720.577: [weak refs processing, 0.0005100 secs]145720.577: [class unloading, 0.0349910 secs]145720.612: [scrub symbol & string tables, 0.0054610 secs] [1 CMS-remark: 1460487K(1572864K)] 1658070K(2044736K), 0.0776190 secs] [Times: user=0.16 sys=0.00, real=0.08 secs] 
2011-06-30T06:16:29.889+0900: 145720.625: [CMS-concurrent-sweep-start]
2011-06-30T06:16:30.381+0900: 145721.117: [CMS-concurrent-sweep: 0.491/0.492 secs] [Times: user=0.55 sys=0.00, real=0.49 secs] 
2011-06-30T06:16:30.381+0900: 145721.117: [CMS-concurrent-reset-start]
2011-06-30T06:16:30.394+0900: 145721.130: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 
2011-06-30T06:16:40.537+0900: 145731.273: [GC 145731.273: [ParNew: 457898K->45230K(471872K), 0.0274660 secs] 1896984K->1484315K(2044736K), 0.0275880 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 
2011-06-30T06:16:57.459+0900: 145748.194: [GC 145748.194: [ParNew: 464686K->52416K(471872K), 0.0387420 secs] 1903771K->1505124K(2044736K), 0.0388710 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] 
2011-06-30T06:16:57.500+0900: 145748.235: [GC [1 CMS-initial-mark: 1452708K(1572864K)] 1506926K(2044736K), 0.0130330 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2011-06-30T06:16:57.513+0900: 145748.249: [CMS-concurrent-mark-start]
2011-06-30T06:16:58.777+0900: 145749.513: [CMS-concurrent-mark: 1.265/1.265 secs] [Times: user=1.37 sys=0.01, real=1.27 secs] 
2011-06-30T06:16:58.778+0900: 145749.513: [CMS-concurrent-preclean-start]
2011-06-30T06:16:58.787+0900: 145749.523: [CMS-concurrent-preclean: 0.008/0.009 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
2011-06-30T06:16:58.787+0900: 145749.523: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2011-06-30T06:17:03.905+0900: 145754.641: [CMS-concurrent-abortable-preclean: 1.033/5.118 secs] [Times: user=1.40 sys=0.05, real=5.12 secs] 
2011-06-30T06:17:03.906+0900: 145754.642: [GC[YG occupancy: 301453 K (471872 K)]145754.642: [Rescan (parallel) , 0.1761440 secs]145754.818: [weak refs processing, 0.0000280 secs]145754.818: [class unloading, 0.0270100 secs]145754.845: [scrub symbol & string tables, 0.0043090 secs] [1 CMS-remark: 1452708K(1572864K)] 1754161K(2044736K), 0.2132540 secs] [Times: user=0.23 sys=0.00, real=0.22 secs] 
2011-06-30T06:17:04.120+0900: 145754.855: [CMS-concurrent-sweep-start]
2011-06-30T06:17:04.595+0900: 145755.331: [CMS-concurrent-sweep: 0.475/0.475 secs] [Times: user=0.50 sys=0.01, real=0.47 secs] 
2011-06-30T06:17:04.595+0900: 145755.331: [CMS-concurrent-reset-start]
2011-06-30T06:17:04.607+0900: 145755.343: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2011-06-30T06:17:05.951+0900: 145756.687: [GC [1 CMS-initial-mark: 1449550K(1572864K)] 1792244K(2044736K), 0.1979100 secs] [Times: user=0.20 sys=0.00, real=0.20 secs] 
2011-06-30T06:17:06.149+0900: 145756.885: [CMS-concurrent-mark-start]
2011-06-30T06:17:07.408+0900: 145758.144: [CMS-concurrent-mark: 1.259/1.259 secs] [Times: user=1.30 sys=0.01, real=1.26 secs] 
2011-06-30T06:17:07.408+0900: 145758.144: [CMS-concurrent-preclean-start]
2011-06-30T06:17:07.417+0900: 145758.153: [CMS-concurrent-preclean: 0.008/0.009 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
2011-06-30T06:17:07.418+0900: 145758.153: [CMS-concurrent-abortable-preclean-start]
2011-06-30T06:17:10.528+0900: 145761.264: [GC 145761.264: [ParNew: 471872K->47677K(471872K), 0.0412690 secs] 1921422K->1507670K(2044736K), 0.0413820 secs] [Times: user=0.13 sys=0.00, real=0.05 secs] 
 CMS: abort preclean due to time 2011-06-30T06:17:12.459+0900: 145763.195: [CMS-concurrent-abortable-preclean: 0.974/5.041 secs] [Times: user=1.55 sys=0.05, real=5.04 secs] 
2011-06-30T06:17:12.460+0900: 145763.195: [GC[YG occupancy: 110735 K (471872 K)]145763.195: [Rescan (parallel) , 0.0157680 secs]145763.211: [weak refs processing, 0.0000140 secs]145763.211: [class unloading, 0.0140640 secs]145763.225: [scrub symbol & string tables, 0.0042960 secs] [1 CMS-remark: 1459992K(1572864K)] 1570728K(2044736K), 0.0399100 secs] [Times: user=0.08 sys=0.00, real=0.04 secs] 
2011-06-30T06:17:12.503+0900: 145763.238: [CMS-concurrent-sweep-start]
2011-06-30T06:17:12.967+0900: 145763.703: [CMS-concurrent-sweep: 0.463/0.464 secs] [Times: user=0.58 sys=0.01, real=0.46 secs] 
2011-06-30T06:17:12.967+0900: 145763.703: [CMS-concurrent-reset-start]
2011-06-30T06:17:12.980+0900: 145763.716: [CMS-concurrent-reset: 0.013/0.013 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 
2011-06-30T06:17:13.222+0900: 145763.957: [GC [1 CMS-initial-mark: 1459706K(1572864K)] 1645309K(2044736K), 0.0994390 secs] [Times: user=0.10 sys=0.00, real=0.10 secs] 
2011-06-30T06:17:13.321+0900: 145764.057: [CMS-concurrent-mark-start]
2011-06-30T06:17:14.589+0900: 145765.324: [CMS-concurrent-mark: 1.267/1.267 secs] [Times: user=1.37 sys=0.00, real=1.26 secs] 
2011-06-30T06:17:14.589+0900: 145765.324: [CMS-concurrent-preclean-start]
2011-06-30T06:17:14.600+0900: 145765.336: [CMS-concurrent-preclean: 0.010/0.011 secs] [Times: user=0.01 sys=0.01, real=0.02 secs] 
2011-06-30T06:17:14.600+0900: 145765.336: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2011-06-30T06:17:19.697+0900: 145770.433: [CMS-concurrent-abortable-preclean: 1.015/5.097 secs] [Times: user=1.38 sys=0.04, real=5.09 secs] 
2011-06-30T06:17:19.697+0900: 145770.433: [GC[YG occupancy: 422112 K (471872 K)]145770.433: [Rescan (parallel) , 0.2029540 secs]145770.636: [weak refs processing, 0.0000600 secs]145770.636: [class unloading, 0.0140870 secs]145770.650: [scrub symbol & string tables, 0.0042890 secs] [1 CMS-remark: 1459706K(1572864K)] 1881819K(2044736K), 0.2271390 secs] [Times: user=0.24 sys=0.00, real=0.23 secs] 
2011-06-30T06:17:19.925+0900: 145770.661: [CMS-concurrent-sweep-start]
2011-06-30T06:17:20.284+0900: 145771.019: [GC 145771.019: [ParNew: 467133K->38523K(471872K), 0.0340350 secs] 1925973K->1507043K(2044736K), 0.0341500 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 
2011-06-30T06:17:20.443+0900: 145771.178: [CMS-concurrent-sweep: 0.479/0.518 secs] [Times: user=0.73 sys=0.01, real=0.52 secs] 
2011-06-30T06:17:20.443+0900: 145771.178: [CMS-concurrent-reset-start]
2011-06-30T06:17:20.455+0900: 145771.191: [CMS-concurrent-reset: 0.013/0.013 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2011-06-30T06:17:22.458+0900: 145773.194: [GC [1 CMS-initial-mark: 1468355K(1572864K)] 1663164K(2044736K), 0.1190060 secs] [Times: user=0.12 sys=0.00, real=0.12 secs] 
2011-06-30T06:17:22.577+0900: 145773.313: [CMS-concurrent-mark-start]
2011-06-30T06:17:23.856+0900: 145774.592: [CMS-concurrent-mark: 1.278/1.279 secs] [Times: user=1.40 sys=0.01, real=1.28 secs] 
2011-06-30T06:17:23.856+0900: 145774.592: [CMS-concurrent-preclean-start]
2011-06-30T06:17:23.870+0900: 145774.606: [CMS-concurrent-preclean: 0.012/0.014 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 
2011-06-30T06:17:23.870+0900: 145774.606: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2011-06-30T06:17:28.933+0900: 145779.669: [CMS-concurrent-abortable-preclean: 0.780/5.063 secs] [Times: user=0.89 sys=0.02, real=5.06 secs] 
2011-06-30T06:17:28.934+0900: 145779.669: [GC[YG occupancy: 327483 K (471872 K)]145779.669: [Rescan (parallel) , 0.1466580 secs]145779.816: [weak refs processing, 0.0000240 secs]145779.816: [class unloading, 0.0144090 secs]145779.831: [scrub symbol & string tables, 0.0043100 secs] [1 CMS-remark: 1468355K(1572864K)] 1795838K(2044736K), 0.1711490 secs] [Times: user=0.18 sys=0.00, real=0.17 secs] 
2011-06-30T06:17:29.105+0900: 145779.841: [CMS-concurrent-sweep-start]
2011-06-30T06:17:29.592+0900: 145780.328: [CMS-concurrent-sweep: 0.487/0.487 secs] [Times: user=0.57 sys=0.01, real=0.49 secs] 
2011-06-30T06:17:29.592+0900: 145780.328: [CMS-concurrent-reset-start]
2011-06-30T06:17:29.605+0900: 145780.340: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2011-06-30T06:17:31.607+0900: 145782.343: [GC [1 CMS-initial-mark: 1467716K(1572864K)] 1875108K(2044736K), 0.2378610 secs] [Times: user=0.24 sys=0.00, real=0.24 secs] 
2011-06-30T06:17:31.845+0900: 145782.581: [CMS-concurrent-mark-start]
2011-06-30T06:17:33.124+0900: 145783.860: [CMS-concurrent-mark: 1.279/1.279 secs] [Times: user=1.38 sys=0.01, real=1.28 secs] 
2011-06-30T06:17:33.124+0900: 145783.860: [CMS-concurrent-preclean-start]
2011-06-30T06:17:33.136+0900: 145783.872: [CMS-concurrent-preclean: 0.011/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2011-06-30T06:17:33.136+0900: 145783.872: [CMS-concurrent-abortable-preclean-start]
2011-06-30T06:17:33.278+0900: 145784.014: [GC 145784.014: [ParNew: 457979K->34427K(471872K), 0.0331350 secs] 1925695K->1513373K(2044736K), 0.0332490 secs] [Times: user=0.12 sys=0.00, real=0.03 secs] 
 CMS: abort preclean due to time 2011-06-30T06:17:38.148+0900: 145788.884: [CMS-concurrent-abortable-preclean: 1.020/5.012 secs] [Times: user=1.60 sys=0.07, real=5.01 secs] 
2011-06-30T06:17:38.149+0900: 145788.885: [GC[YG occupancy: 225806 K (471872 K)]145788.885: [Rescan (parallel) , 0.0359240 secs]145788.921: [weak refs processing, 0.0000080 secs]145788.921: [class unloading, 0.0141010 secs]145788.935: [scrub symbol & string tables, 0.0042870 secs] [1 CMS-remark: 1478946K(1572864K)] 1704752K(2044736K), 0.0600530 secs] [Times: user=0.15 sys=0.01, real=0.06 secs] 
2011-06-30T06:17:38.209+0900: 145788.945: [CMS-concurrent-sweep-start]
2011-06-30T06:17:38.677+0900: 145789.413: [CMS-concurrent-sweep: 0.468/0.468 secs] [Times: user=0.50 sys=0.00, real=0.47 secs] 
2011-06-30T06:17:38.677+0900: 145789.413: [CMS-concurrent-reset-start]
2011-06-30T06:17:38.689+0900: 145789.425: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2011-06-30T06:17:40.692+0900: 145791.428: [GC [1 CMS-initial-mark: 1478750K(1572864K)] 1772006K(2044736K), 0.1637570 secs] [Times: user=0.17 sys=0.00, real=0.17 secs] 
2011-06-30T06:17:40.856+0900: 145791.591: [CMS-concurrent-mark-start]
2011-06-30T06:17:42.142+0900: 145792.878: [CMS-concurrent-mark: 1.285/1.286 secs] [Times: user=1.46 sys=0.02, real=1.28 secs] 
2011-06-30T06:17:42.142+0900: 145792.878: [CMS-concurrent-preclean-start]
2011-06-30T06:17:42.154+0900: 145792.890: [CMS-concurrent-preclean: 0.010/0.012 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 
2011-06-30T06:17:42.154+0900: 145792.890: [CMS-concurrent-abortable-preclean-start]
2011-06-30T06:17:44.003+0900: 145794.739: [GC 145794.739: [ParNew: 453806K->49194K(471872K), 0.0277970 secs] 1932556K->1527945K(2044736K), 0.0279130 secs] [Times: user=0.10 sys=0.00, real=0.02 secs]
CMS Log .....
CMS Log .....
CMS Log .....
...
normal gc log....

回答1:

You are experiencing GC thrashing during that problematic time interval. The reason is simple: you are using all that memory during that time. If you look at a minor collection right before the CMS begins, you see that after the collection finishes, there is close to 1.5G of occupied space, which is roughly the size of your old gen. So it's full. If you check that same value after each of your CMS collections finish, you see similar values. It means that your CMS collections are not recovering any memory. Since the CMS collector is configured to begin when occupancy reaches 92%, you begin a new CMS immediately after completing the last one because you are never getting the occupancy below 92%. That the definition of thrashing.

2011-06-30T06:16:23.468+0900: 145714.204: [GC 145714.204: [ParNew: 455108K->38442K(471872K), 0.0356610 secs] 1900242K->1498930K(2044736K), 0.0357770 secs] [Times: user=0.13 sys=0.00, real=0.04 secs]
// CMS happens, but no memory is recovered, old gen still at ~1.5G
2011-06-30T06:16:40.537+0900: 145731.273: [GC 145731.273: [ParNew: 457898K->45230K(471872K), 0.0274660 secs] 1896984K->1484315K(2044736K), 0.0275880 secs] [Times: user=0.11 sys=0.00, real=0.03 secs]

The question becomes: why are you using all that memory in the middle of the night when your traffic is at it's lowest level? First, traffic itself generates short-lived, request-scoped objects and will rarely cause a problem like this (unless there is a memory leak). It's possible that you have some sort of background process that happens at that time, perhaps some sort of bulk cache refresh or something.

What can you do? You don't have the headroom to handle whatever is happening at that time. If you find a background process that is building huge data structures in memory, considering building those data structures incrementally to relieve strain on memory usage. To resolve the mystery, just take a heap dump during the thrashing period. It should be obvious what type of object is consuming all that memory. It will probably be a HashMap or something.

You can even set this option on the JVM to get automatic heap dumps whenever the JVM throws an Out-of-Memory error: -XX:+HeapDumpOnOutOfMemoryError. Often, GC thrashing is accompanied by OOM errors because the JVM will not thrash forever if it thinks it is spending too much time on GC and not getting much in return. Your case looks like you could be getting an OOM error, which would kill the running thread, which might eliminate the offender and free up all that memory. That could be the reason you return to normal behavior after a period of thrashing. The next time it happens, you will have a heap dump waiting for you in the morning.