My app is constantly running Full GC!

2020-05-27 08:26发布

问题:

I am a newbie at performance tuning applications and figuring out the workings of GC so probably asking the same question a millionth time!

The problem is that 2 - 3 weeks ago, somehow somewhere my Web application started crashing every now and then. Looking at the logs, it was deduced that its crashing due to OOM errors:

Caused by: java.sql.SQLException: java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:430)
    ...

and

14:29:58,469 ERROR [[dispatcher]] Servlet.service() for servlet dispatcher threw exception
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.hibernate.event.def.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:193)
    at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:76)
    at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:35)
    at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:969)
    at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1114)
    at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79)
    at com.tennisearth.dao.hibernate.HibernateCommentaryDao.getCommentary(HibernateCommentaryDao.java:52)
    at com.tennisearth.service.impl.CommentaryServiceImpl.getCommentary(CommentaryServiceImpl.java:454)
    at com.tennisearth.service.impl.CommentaryServiceImpl.getCommentaryMap(CommentaryServiceImpl.java:165)
    at com.tennisearth.web.controllers.WidgetsController.commentaryList(WidgetsController.java:704)
    at com.tennisearth.web.controllers.WidgetsController.widgets(WidgetsController.java:290)
    at sun.reflect.GeneratedMethodAccessor259.invoke(Unknown Source)
            ...

Subsequently, I started logging GC logs and found that initially when server starts, GC is running fine:

3.808: [GC 56505K->5808K(251264K), 0.0120840 secs]
3.820: [Full GC 5808K->5614K(251264K), 0.0540460 secs]
7.169: [GC 71214K->9589K(251264K), 0.0068780 secs]
8.173: [GC 75189K->13543K(251264K), 0.0174120 secs]
8.624: [GC 79143K->13693K(251264K), 0.0088080 secs]
9.907: [GC 79293K->16013K(251264K), 0.0132330 secs]
11.832: [GC 81613K->22100K(311360K), 0.0227030 secs]
13.338: [GC 136508K->38851K(316864K), 0.0346600 secs]
13.373: [Full GC 38851K->38559K(316864K), 0.2093700 secs]
15.113: [GC 164255K->45826K(331520K), 0.0151220 secs]
18.946: [GC 177794K->58815K(322688K), 0.0254140 secs]
22.699: [GC 186431K->70079K(322880K), 0.0500300 secs]
40.246: [GC 191871K->78818K(311296K), 0.0429900 secs]
41.280: [GC 196706K->81375K(324608K), 0.0340230 secs]
42.798: [GC 199135K->82432K(324736K), 0.0074390 secs]
43.487: [GC 200192K->85729K(394112K), 0.0098220 secs]
45.564: [GC 274145K->97421K(394688K), 0.0212620 secs]
46.829: [GC 285837K->100769K(491968K), 0.0287150 secs]
48.011: [GC 388705K->106326K(491648K), 0.0275700 secs]
51.035: [GC 394262K->114643K(493376K), 0.0199210 secs]
58.073: [GC 407187K->118997K(493760K), 0.0190090 secs]
61.094: [GC 411541K->122449K(496320K), 0.0181850 secs]
83.288: [GC 419985K->128776K(467968K), 0.0206970 secs]
91.216: [GC 414152K->136966K(459136K), 0.0237830 secs]
108.336: [GC 410758K->137782K(445632K), 0.0158180 secs]
116.492: [GC 400566K->139454K(434304K), 0.0126040 secs]
139.645: [GC 391742K->140705K(420608K), 0.0113540 secs]
150.825: [GC 383009K->158942K(428096K), 0.0375920 secs]
151.909: [GC 391774K->178964K(437824K), 0.0677160 secs]
153.518: [GC 417702K->280503K(496000K), 0.1514220 secs]
153.669: [Full GC 280503K->274184K(618880K), 0.7686300 secs]
155.431: [GC 468706K->366398K(658880K), 0.1449730 secs]
155.579: [GC 366772K->364514K(603072K), 0.0524370 secs]
155.631: [Full GC 364514K->348726K(753728K), 0.9406650 secs]
157.072: [GC 508278K->395401K(733376K), 0.0369850 secs]
157.839: [GC 554533K->473779K(762816K), 0.1072000 secs]
159.105: [GC 614259K->509767K(771840K), 0.0883110 secs]
163.696: [GC 650247K->516783K(748416K), 0.0878210 secs]
163.784: [Full GC 516783K->512313K(920384K), 0.6153950 secs]

but on checking logs the next day, GC logs show Full GCs all the time:

65515.860: [Full GC 815615K->763589K(932096K), 1.3774000 secs]
65517.398: [Full GC 815615K->761150K(932096K), 1.3959730 secs]
65518.920: [Full GC 815615K->770183K(932096K), 1.3627860 secs]
65520.388: [Full GC 815615K->772928K(932096K), 1.3690040 secs]
65521.849: [Full GC 815615K->777388K(932096K), 1.3932870 secs]
65523.321: [Full GC 815615K->773739K(932096K), 1.6262920 secs]
65525.032: [Full GC 815615K->782518K(932096K), 1.3857020 secs]
65526.493: [Full GC 815615K->784568K(932096K), 1.3901050 secs]
65528.031: [Full GC 815615K->743695K(932096K), 1.2809140 secs]
65530.065: [Full GC 815615K->721823K(932096K), 1.3245560 secs]
65538.982: [Full GC 815615K->729961K(932096K), 1.2241330 secs]
65540.508: [Full GC 815615K->729519K(932096K), 1.2257770 secs]
65542.135: [Full GC 815615K->731559K(932096K), 1.2206840 secs]
65547.769: [Full GC 815615K->722653K(932096K), 1.5854120 secs]
65558.803: [Full GC 815616K->727582K(932096K), 1.2067870 secs]
65566.769: [Full GC 815615K->728443K(932096K), 1.2114200 secs]
65570.652: [Full GC 815616K->730066K(932096K), 1.2135840 secs]
65572.352: [Full GC 815616K->723875K(932096K), 1.4702710 secs]
65577.304: [Full GC 815615K->727897K(932096K), 1.2087790 secs]
65583.316: [Full GC 815615K->727540K(932096K), 1.2091610 secs]
65590.292: [Full GC 815615K->728114K(932096K), 1.2074670 secs]
65599.993: [Full GC 815615K->722369K(932096K), 1.2465300 secs]
65616.109: [Full GC 815615K->728427K(932096K), 1.2092820 secs]
65620.070: [Full GC 815615K->728823K(932096K), 1.2068140 secs]
65626.774: [Full GC 815615K->728454K(932096K), 1.2046050 secs]
65637.302: [Full GC 815615K->722224K(932096K), 1.4699560 secs]
65639.319: [Full GC 815615K->728140K(932096K), 1.2258630 secs]
65656.674: [Full GC 815615K->726831K(932096K), 1.2203520 secs]
65667.239: [Full GC 815615K->727786K(932096K), 1.2212360 secs]
65678.905: [Full GC 815615K->721629K(932096K), 1.4281870 secs]
65686.655: [Full GC 815615K->728126K(932096K), 1.2147860 secs]
65689.521: [Full GC 815615K->723848K(932096K), 1.2070410 secs]
65697.409: [Full GC 815615K->727932K(932096K), 1.2111580 secs]
65712.853: [Full GC 815615K->721999K(932096K), 1.4991350 secs]
65719.399: [Full GC 815615K->727715K(932096K), 1.2149930 secs]
65727.209: [Full GC 815615K->727355K(932096K), 1.2048690 secs]
65728.726: [Full GC 815615K->730012K(932096K), 1.2185660 secs]
65730.225: [Full GC 815615K->725299K(932096K), 1.4965130 secs]
65732.111: [Full GC 815615K->728544K(932096K), 1.2107770 secs]
65738.952: [Full GC 815615K->726932K(932096K), 1.2066580 secs]

Is it my app leaking memory or is it my configuration that should be fine-tuned? I am using the following setup:

CentOS release 5.2 (Final) x86_64  
Java JDK 1.6.06 64-bit  
JBoss 4.2.2.GA
RAM: 4GB
Swap Space: 2GB
Processor: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz

The following command is used to run jboss:

java -Dprogram.name=run.sh -server -Xms256m -Xmx1024m -XX:PermSize=64m -XX:MaxPermSize=256m -verbose:gc -Xloggc:/data1/logs/jboss/GC.log -XX:+HeapDumpOnOutOfMemoryError -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Dorg.apache.catalina.STRICT_SERVLET_COMPLIANCE=false -Djava.net.preferIPv4Stack=true -Djava.library.path=/usr/local/java/jboss-4.2.2.GA/bin/native -Djava.endorsed.dirs=/usr/local/java/jboss-4.2.2.GA/lib/endorsed -classpath /usr/local/java/jboss-4.2.2.GA/bin/run.jar:/usr/local/java/jdk1.6.0_06/lib/tools.jar org.jboss.Main -c default -b <IP_ADDRESS> -Djboss.messaging.ServerPeerID=1

Any help will be very much appriciated. Also, any light on how to analyse GC / Heap Dumps would also be a great help for me in the future.

Best Regards,
Sumit Taneja

回答1:

Running with 4GB RAM and using -Xmx1024m (maximum amount of java heap 1024M) might not be intentional.

On the other hand, it looks to me your query is returning too many results either because of the joins or using null limit, which turns into an oversized list.

You could start by taking a heap snapshot in the middle of GCs with

# jmap -dump:format=b,file=dump.hprof <pid>

And then using the aforementioned Memory Analyzer.



回答2:

Eclipse features a great tool called Memory Analyzer. It can analyze exactly those heap dumps you generate on OOM. It draws pretty good charts to help you narrow where the bulk of your memory consumption is - your app, jboss, other parts of your setup - down to a very fine level.

If you paste its output here we could continue to investigate your problem.



回答3:

Somewhere your code is creating temporary objects in an memory- less environment.
Please look at your code (HibernateCommentaryDao.java) 

it is something with n+1 fetch

Excessive GC Time and OutOfMemoryError

The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line