Confusing Tomcat Persistent Session Memory Profile

2020-06-19 07:04发布

As with any memory management issue, this is a long story, so strap-in.

We have an application that has been suffering from some memory management issues, so I've been trying to profile the app to get an idea of where the problem lies. I saw this thread earlier today:

Tomcat Session Eviction to Avoid OutOfMemoryError

... which seems to follow suit with what I was seeing in the profiler. Basically, if I hit the app with a bunch of users with Jmeter it would hold onto the heap memory for a long time, eventually until the sessions began to expire. Unlike the poster in that thread however, I have the source, and the option to try to implement persistent state sessions with Tomcat, which is what I've been trying to do today, with limited success. I think it is some configuration setting I am missing. Here is what I have in context.xml:

  <Manager className='org.apache.catalina.session.PersistentManager'
                saveOnRestart='false'
                maxActiveSessions='5'
                minIdelSwap='0'
                maxIdleSwap='1'
                maxInactiveInterval='600'
                maxIdleBackup='0'>
   <Store className='org.apache.catalina.session.FileStore'/>
   </Manager>

And in web.xml I have this:

<session-config>
    <session-timeout>
        10
    </session-timeout>
</session-config>

Ideally I would like the sessions to timeout in an hour, but for testing purposes this is fine. Now after having played around with some things and finally coming to these settings, I am watching the app in the profiler and this is what I am seeing:

enter image description here

As you can see in the image, I have put some annotations. It's the part circled around the question mark which I really do not understand more than anything else. You can see that I am using the Jconsole 'Perform GC' button at a few different points, and you can see the part in the chart where I am hitting the application with many client with Jmeter.

If I recall correctly (I'd have to go back and really document it to be sure), before I implemented persistent state, the GC button would not do nearly as much with regards to clearing the heap. The strange bit here is that it seems like I have to manually run GC for this persistent state to actually help with anything.

Alternatively, is this just a plain memory leak scenario? Earlier today, I took a heap dump and loaded it into Eclipse Memory Analyzer tool, and used the 'Detect Leak' feature, and all it did was reinforce the theory that this is a session size issue; the only leak it detected was in java.util.concurrent.ConcurrentHashMap$Segment which led me to this thread Memory Fully utilized by Java ConcurrentHashMap (under Tomcat)

which makes me think it's not the app that's actually leaking.

Other relevant details: - Running/testing this on my local machine for the time being. That's where all these results come from. - Using Tomcat 6 - This is a JSF2.0 app - I have added the system property -Dorg.apache.catalina.STRICT_SERVLET_COMPLIANCE=true as per the Tomcat documentation

So, I guess there is several questions here: 1. Do I have this configured properly? 2. Is there a memory leak? 3. What is going on in this memory profile? 4. Is it (relatively) normal?

Thanks in advance.

UPDATE

So, I tried Sean's tips, and found some new interesting things.

The session listener works great, and has been instrumental in analyzing this scenario. Something else, that I forgot to mention is that the app's load is only really confounded by a single page, which reaches a functional complexity on an almost comical proportion. So, in testing, I sometimes aim to hit that page, and other times I avoid it. So in my next round of tests, this time using the session listener I found the following:

1) Hit the app with several dozen clients, just visiting a simple page. I noted that the sessions were all released as expected after the specified time limit, and the memory was released. Same thing works perfectly with a trivial number of clients, for the complex case, hitting the 'big' page.

2) Next, I tried hitting the app with the complex use case, with several dozen clients. This time, several dozen more sessions were initiated than expected. It seemed like every client initiated between one and three sessions. After the session expiration time, a little memory was released, but according to the session listener, only about a third of the sessions were destroyed. Contradicting that though, the folder which actually contains the sessions data is empty. Most of the memory that was used is being held, too. But, after exactly one hour of running the stress test, the garbage collector runs and everything is back to normal.

So, follow up questions include:

1) Why might the sessions be handled properly in the simple case, but when things get memory intensive, they stop being managed correctly? Is the Session handler misreporting things, or does using JMeter impact this in some way?

2) Why does the garbage collector wait one hour to run? Is there a system setting that mandates that the garbage collector MUST run within a given time frame, or if there some configuration setting I'm missing?

Thank you again for continued support.

UPDATE 2

Just a quick note: playing around with this some more, I found out the reason I was getting different reports about the number of live sessions is due to the fact that I am using persistent session; if I turn it off, everything works as expected. It indeed says on the Tomcat page for persistent session, that it is an experimental feature. It must be calling sessions events that the listener is picking up on, when one would atypically expect it to do so.

2条回答
一夜七次
2楼-- · 2020-06-19 07:25

I don't see where anyone pointed out your typo:

minIdelSwap='0'

Not sure that's any use to anyone, but it would certainly cause the PersistenceManager to behave differently than you might expect it to.

查看更多
Root(大扎)
3楼-- · 2020-06-19 07:42

Let me try and tackle the questions the best I can

  1. For your configuration, it looks good to me on paper. If you want to be sure your sessions are getting created and destroyed properly you will want to setup a Session listener to either log or display a list of open sessions. Here is a link to an example

  2. I don't think there is a memory leak. The incremental GC's are being run. there does seems to be a lot of memory being promoted to the old gen. But once your application is able to run a full GC the memory is going back to normal. If you need your users session's to have a lot of data stored in them, memory will be your trade-off. As long as you destroy the data once they logout/timeout the memory should be ok in the long run. Your question (???) area would be the grey area where your sessions are still active so the data stored in them would be held, possibly promoted to the old gen at this point. Once memory is promoted to the Old Gen it needs a full GC to clean it up. Clicking the button did that. The application would have eventually scheduled the full GC.

  3. I think #2 helps answer this

  4. Sure it looks normal for a stress test. Try running it again without hitting the Perform GC button. See if the memory will go back down by itself. Note that the garbage collector will do the full GC when it feels that it should do it. To clear the old gen requires pausing so the GC won't perform that action unless it feels it doesn't have enough resources to manage the application properly.

A few other things you may want to do for future tests. Take note of the heap usage between the young generation and the old generation of the heap. From the screenshot we can see there is 872mb of heap committed so to have a break down of where the memory is sitting during your load tests will help to identify properly where the heap is being kept.

In similar fashion consider turning on verbose GC logging so you can get a report of the heap breakdown. there are many tools out there to help you graph the GC log so you don't have to read the log manually. This can also be logged to a file (-Xloggc:file) if you don't want it to be sent to stdout.

If the session size is becoming too large to manage, you may want to offload that memory elsewhere. A few examples may be to use BigMemory, DB Persistance, or some other way of reducing the session sizes.

查看更多
登录 后发表回答