Connection hanging occasionally with Amazon RDS My

2019-05-04 02:04发布

We're having problems with Amazon-hosted Play application with RDS Mysql.

Sometimes the application ends up unresponsive. Today I checked the thread status with jstack -F and there were multiple threads stuck in native code for c3p0 connection :

Thread 14060: (state = IN_NATIVE)
 - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
 - java.net.SocketInputStream.read(byte[], int, int, int) @bci=79, line=150 (Compiled frame)
 - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=121 (Compiled frame)
 - com.mysql.jdbc.util.ReadAheadInputStream.fill(int) @bci=262, line=114 (Compiled frame)
 - com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(byte[], int, int) @bci=176, line=161 (Compiled frame)
 - com.mysql.jdbc.util.ReadAheadInputStream.read(byte[], int, int) @bci=48, line=189 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.readFully(java.io.InputStream, byte[], int, int) @bci=34, line=3036 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.reuseAndReadPacket(com.mysql.jdbc.Buffer, int) @bci=23, line=3489 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.reuseAndReadPacket(com.mysql.jdbc.Buffer) @bci=3, line=3478 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.checkErrorPacket(int) @bci=12, line=4019 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.sendCommand(int, java.lang.String, com.mysql.jdbc.Buffer, boolean, java.lang.String, int) @bci=428, line=2490 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.sqlQueryDirect(com.mysql.jdbc.StatementImpl, java.lang.String, java.lang.String, com.mysql.jdbc.Buffer, int, int, int, boolean, java.lang.String, com.mysql.jdbc.Field[]) @bci=559, line=2651 (Compiled frame)
 - com.mysql.jdbc.ConnectionImpl.execSQL(com.mysql.jdbc.StatementImpl, java.lang.String, int, com.mysql.jdbc.Buffer, int, int, boolean, java.lang.String, com.mysql.jdbc.Field[], boolean) @bci=130, line=2677 (Compiled frame)
 - com.mysql.jdbc.ConnectionImpl.execSQL(com.mysql.jdbc.StatementImpl, java.lang.String, int, com.mysql.jdbc.Buffer, int, int, boolean, java.lang.String, com.mysql.jdbc.Field[]) @bci=17, line=2627 (Compiled frame)
 - com.mysql.jdbc.StatementImpl.executeQuery(java.lang.String) @bci=534, line=1556 (Compiled frame)
 - com.mysql.jdbc.DatabaseMetaData$9.forEach(java.lang.Object) @bci=174, line=5013 (Compiled frame)
 - com.mysql.jdbc.IterateBlock.doForAll() @bci=25, line=51 (Compiled frame)
 - com.mysql.jdbc.DatabaseMetaData.getTables(java.lang.String, java.lang.String, java.lang.String, java.lang.String[]) @bci=280, line=4991 (Compiled frame)
 - com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnectionNoQuery(java.sql.Connection, java.lang.Throwable[]) @bci=21, line=185 (Compiled frame)
 - com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(java.sql.Connection, java.lang.String, java.lang.Throwable[]) @bci=7, line=62 (Compiled frame)
 - com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(java.sql.Connection) @bci=4, line=67 (Compiled frame)
 - com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(java.lang.Object) @bci=122, line=368 (Compiled frame)
 - com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(java.lang.Object) @bci=26, line=310 (Compiled frame)
 - com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run() @bci=35, line=1999 (Interpreted frame)
 - com.mchange.v2.async.ThreadPerTaskAsynchronousRunner$TaskThread.run() @bci=41, line=255 (Interpreted frame)

DefaultConnectionTester.activeCheckConnectionNoQuery seems to check DatabaseMetaData.getTables if no other connection testing is set up.

C3p0 manual has instructions for configuring proper connection testing and this answer has instructions for c3p0 settings with Play! but I'm wondering:

  • Is there some way to check out what's the issue behing this?
    • The standard "Mysql connection being killed after 8 hours" I've encountered before results as an exception
  • There seems to be some default connection testing with c3p0/Play/Hibernate since this is done?
    • I have a hunch that by default c3p0 doesn't do connection testing
      • default values idleConnectionTestPeriod = 0, testConnectionOnCheckin = false, testConnectionOnCheckout = false
  • Is there a way to set timeout for connection testing query?
    • Even if I configure some sensible preferredTestQuery, for example, is there risk of that query also hangs as current getTables() call?
  • Is there something else that I should consider?

UPDATE with automaticTestTable (wasn't the solution)

We set c3p0.automaticTestTable to have more sensible connection testing but that didn't seem to fix the problem, just changing a bit the stack trace for the threads.

In similar case, we now have thread stack as following:

"Task-Thread-for-com.mchange.v2.async.ThreadPerTaskAsynchronousRunner@6064cd08" daemon prio=10 tid=0x00007f53d403a000 nid=0x63dd runnable [0x00007f53dce4a000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
        at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
        at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
        - locked <0x00000000dbecd0d8> (a com.mysql.jdbc.util.ReadAheadInputStream)
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3036)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3489)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3478)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4019)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2677)
        - locked <0x00000000dbec1a70> (a com.mysql.jdbc.JDBC4Connection)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2627)
        at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1556)
        - locked <0x00000000dbec1a70> (a com.mysql.jdbc.JDBC4Connection)
        - locked <0x00000000d1330078> (a com.mysql.jdbc.StatementImpl)
        at com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:73)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:374)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:310)
        at com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:1999)
        at com.mchange.v2.async.ThreadPerTaskAsynchronousRunner$TaskThread.run(ThreadPerTaskAsynchronousRunner.java:255)

Play threads were waiting to get a DB connection:

"play-thread-3" prio=10 tid=0x00007f53c400b000 nid=0x5b3e in Object.wait() [0x00007f53ddd57000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:579)
        - locked <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
        ... (näitä vaihtelevat määrät)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
        - locked <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
        - locked <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:477)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:525)
        at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:128)
        at org.hibernate.ejb.connection.InjectedDataSourceConnectionProvider.getConnection(InjectedDataSourceConnectionProvider.java:71)
        at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:446)
        at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:167)
        at org.hibernate.jdbc.JDBCContext.connection(JDBCContext.java:160)
        at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:81)
        at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1473)
        at org.hibernate.ejb.TransactionImpl.begin(TransactionImpl.java:60)
        at play.db.jpa.JPAPlugin.startTx(JPAPlugin.java:377)
        at play.db.jpa.JPAPlugin.beforeInvocation(JPAPlugin.java:345)
        at play.plugins.PluginCollection.beforeInvocation(PluginCollection.java:473)
        at play.Invoker$Invocation.before(Invoker.java:217)
        at play.Invoker$Invocation.run(Invoker.java:277)
        at play.server.PlayHandler$NettyInvocation.run(PlayHandler.java:229)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

Some notes:

  • It IS possible (at least in theory) that many connections just happen to be tested the same time as the stack dump is readed, even though I find this unlikely.
  • In another question it was mentioned that data deadlocks or hardware/network problems might cause this kind of behaviour.
    • At least there shouldn't be any ongoing schema modification.

br, Touko


Answering to myself:

  • No reason found for the connection hanging :(
    • RDS has error.log for MySQL, unfortunately nothing there
  • Play has some default settings for c3p0 ComboPooledDataSource (in DBPlugin.onApplicationStart() )
    • idleTimeExcessConnections : from application.conf db.pool.maxIdleTimeExcessConnections (default 0)
    • checkoutTimeout : from application.conf db.pool.timeout (default 5000)
    • idleConnectionTestPeriod : 10
    • setTestConnectionOnCheckin : true
    • In addition, conf/c3p0.properties can be used to set custom properties
  • Haven't found way to set timeout for connection testing (with brief DefaultConnectionTester source code lookup it might not be supported)

We will proceed with setting c3p0.automaticTestTable and check out whether there will be any problems later.

UPDATED : Unfortunately we didn't get rid of problems yet :( Also with automaticTestTable the connection testing seems to have get stuck java.net.SocketInputStream.socketRead0(Native Method)


Answering again to myself:

We ended up trying this "set timeout for connection testing query" approach with c3p0:s maxAdministrativeTaskTime. This is a bit of a hack so got to see whether this can be the final solution (if it works).

So far no problems yet, we'll see how this works. No c3p0 errors / stack traces in the log either, I'd expect this interruption to be shown in the logs? Either timeout hasn't occurred yet or we should tune logging configuration to get c3p0 logs shown.

UPDATE (2013-04-04) : After this the problem (probably) hasn't occurred so often but has occurred every now and then :(

1条回答
聊天终结者
2楼-- · 2019-05-04 02:35

Answering again to myself:

We ended first up using socketTimeout specified for MySQL driver so so that connection test doesn't get stuck.

It seems that the underlying TCP connections ended up so that that there is no answer from the server (TCP ACK or anything).

UPDATE: The final solution was got with giving tcpdump for AWS support, they found out some case within AWS infrastructure. They adviced us to open the ephemeral port range in our EC2 instance's security group. More discussion at AWS forum.

查看更多
登录 后发表回答