Play Framework becoming unresponsive? (Possibly me

2019-07-17 02:49发布

问题:

We got a Play 1.2.5 application and we've had some problems with the application becoming unresponsive.

After setting proper memory settings for the application the problem hasn't reoccurred (a couple of days ATM) but I'd like to get idea of the actual reason and if there's some way to see it in the logs.

In our setup, we got

  • Play 1.2.5 application running on AWS (Ubuntu 12.04)
  • MySQL RDS database
  • Apache server working as a proxy (handling SSL etc).

This has happened for various calls but I have an example of monitoring healthcheck with simple renderText-implementation (just 200 & "OK"). We've had these "every now and then". Application has returned responsive without booting.

Apache access log had:

  (IP addr) - - [01/Mar/2013:09:31:16 +0200] "GET /monitor/healthcheck HTTP/1.1" 502 4305 "-" "NING/1.0"

Apache error log had:

  [Fri Mar 01 09:36:16 2013] [error] [client (IP addr)] (70007)The timeout specified has expired: proxy: error reading status line from remote server localhost:8080
  [Fri Mar 01 09:36:16 2013] [error] [client (IP addr)] proxy: Error reading from remote server returned by /monitor/healthcheck

(Apache has 300s=5m proxy timeout length)

Play logs haven't had anything there (we got request URL logging at the controller so at least the request hasn't found it's way up there OR the logging has had problems)

The first thought is running out of threads. This seems pretty unlikely to me, since:

  • We are under development -> pretty low traffic
    • This has occurred also in cases that the logs don't have previous traffic for a couple of hours
  • We got 10 threads (play.pool=10)
  • We don't have async WS calls used (those seem to be somewhat buggy with Play 1.2.X)
  • No calls blocking for long time
  • With random testing after variuos usage there doesnt seem to be threads hanging (examined with jstack everything seems to be ~OK)

(Maybe related, maybe not) : One time we checked jstack so that it didn't respond for a cacll:

$ jstack 7842
7842: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

However, before trying -F we tried again and got a proper response so if the JVM was in some unresponsive state, it made it OK pretty soon.

With some assistance, we set up proper memory settings and since that (last Friday 2013-03-01) we haven't had this problem.

jvm.memory=-Xms64m -Xmx512m -XX:PermSize=64m -XX:MaxPermSize=256m

However, we didn't have any memory issues printed in the log. I'm still a bit worried since I don't have a clue about the actual reason, so:

  • What might be the cause
    • Some memory issue, why not found in logs?
    • Some (nondeterministic) thing that would leave threads blocked for long time
  • Is there some way to see the cause in logs if this happens again?
    • Some settings needed to get memory issues in the log?

UPDATE : Seems to probably be issue with MySQL connection testing hanging. Created another more focused question and will try to remember to update this also after the issue is solved.

回答1:

The reason was that TCP connections to RDS MySQL went stale every now and then -> the c3p0 connection pooler admin threads were all stuck doing connection testing -> Play request threads ended up waiting at JPAPlugin.beforeInvocation to get a DB connection.

See the more focused question Connection hanging occasionally with Amazon RDS MySQL & Play Framework 1.2.5 (c3p0 default settings) more more details.