Tomcat active threads pile up and stalls the serve

2020-07-17 16:30发布

问题:

We are having this issue with our production servers (apache-tomcat-7.0.6), which is running a Spring-JPA-Hibernate application using MySQL as DB. During this issue server becomes slow and within a minute it becomes unresponsive with number of active tomcat threads shoots up to 200(We use javamelody to observe these things). Logs look something like:

2012-04-07 07:53:48,058 DEBUG AuthenticationHandler:("http-bio-7006"-exec-304):35 - Authentication handler : ~
2012-04-07 07:53:48,059 DEBUG AuthenticationHandler:("http-bio-7006"-exec-304):45 - Service Id : 1333765428059-/xxx-framework-service/platform/items/search/
2012-04-07 07:53:48,060 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-304):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:53:48,060 DEBUG JpaTransactionManager:("http-bio-7006"-exec-304):365 - Creating new transaction with name [com.xxx.framework.manager.ItemManager.search]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT,readOnly; ''
2012-04-07 07:53:48,060 DEBUG JpaTransactionManager:("http-bio-7006"-exec-304):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@6aee43a9] for JPA transaction
2012-04-07 07:53:53,583 DEBUG AuthenticationHandler:("http-bio-7006"-exec-217):35 - Authentication handler : System~System
2012-04-07 07:53:53,584 DEBUG AuthenticationHandler:("http-bio-7006"-exec-217):45 - Service Id : 1333765433584-/xxx-framework-service/platform/skus/get_sku_details
2012-04-07 07:53:53,585 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-217):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:53:53,586 DEBUG JpaTransactionManager:("http-bio-7006"-exec-217):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getSkuDetails]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:53:53,586 DEBUG JpaTransactionManager:("http-bio-7006"-exec-217):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@7cb3b78] for JPA transaction
2012-04-07 07:54:04,297 DEBUG AuthenticationHandler:("http-bio-7006"-exec-311):35 - Authentication handler : System~System
2012-04-07 07:54:04,298 DEBUG AuthenticationHandler:("http-bio-7006"-exec-311):45 - Service Id : 1333765444298-/xxx-framework-service/platform/skus/get_sku_details
2012-04-07 07:54:04,299 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-311):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:04,299 DEBUG JpaTransactionManager:("http-bio-7006"-exec-311):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getSkuDetails]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:54:04,299 DEBUG JpaTransactionManager:("http-bio-7006"-exec-311):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@d8c7ec2] for JPA transaction
2012-04-07 07:54:05,713 DEBUG AuthenticationHandler:("http-bio-7006"-exec-252):35 - Authentication handler : System~System
2012-04-07 07:54:05,714 DEBUG AuthenticationHandler:("http-bio-7006"-exec-252):45 - Service Id : 1333765445714-/xxx-framework-service/platform/skus/get_sku_details
2012-04-07 07:54:05,715 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-252):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:05,715 DEBUG JpaTransactionManager:("http-bio-7006"-exec-252):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getSkuDetails]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:54:05,715 DEBUG JpaTransactionManager:("http-bio-7006"-exec-252):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@685eb738] for JPA transaction
2012-04-07 07:54:16,293 DEBUG AuthenticationHandler:("http-bio-7006"-exec-282):35 - Authentication handler : System~System
2012-04-07 07:54:16,294 DEBUG AuthenticationHandler:("http-bio-7006"-exec-282):45 - Service Id : 1333765456294-/xxx-framework-service/platform/skus/get_sku_details
2012-04-07 07:54:16,295 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-282):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:16,295 DEBUG JpaTransactionManager:("http-bio-7006"-exec-282):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getSkuDetails]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:54:16,296 DEBUG JpaTransactionManager:("http-bio-7006"-exec-282):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@6d37b108] for JPA transaction
2012-04-07 07:54:16,541 DEBUG AuthenticationHandler:("http-bio-7006"-exec-302):35 - Authentication handler : Heartbeat User~dummy_heartbeat
2012-04-07 07:54:16,542 DEBUG AuthenticationHandler:("http-bio-7006"-exec-302):45 - Service Id : 1333765456542-/xxx-framework-service/platform/skus/5500
2012-04-07 07:54:16,543 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-302):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:16,543 DEBUG JpaTransactionManager:("http-bio-7006"-exec-302):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.findById]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT,readOnly; ''
2012-04-07 07:54:16,543 DEBUG JpaTransactionManager:("http-bio-7006"-exec-302):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@7a973a0f] for JPA transaction
2012-04-07 07:54:19,708 DEBUG AuthenticationHandler:("http-bio-7006"-exec-236):35 - Authentication handler : System~System
2012-04-07 07:54:19,709 DEBUG AuthenticationHandler:("http-bio-7006"-exec-236):45 - Service Id : 1333765459709-/xxx-framework-service/platform/skus/get_sku_details
2012-04-07 07:54:19,710 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-236):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:19,710 DEBUG JpaTransactionManager:("http-bio-7006"-exec-236):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getSkuDetails]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:54:19,711 DEBUG JpaTransactionManager:("http-bio-7006"-exec-236):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@342f22a9] for JPA transaction
2012-04-07 07:54:22,648 DEBUG AuthenticationHandler:("http-bio-7006"-exec-292):35 - Authentication handler : System~System
2012-04-07 07:54:22,649 DEBUG AuthenticationHandler:("http-bio-7006"-exec-292):45 - Service Id : 1333765462649-/xxx-framework-service/platform/skus/get_avaliable_count
2012-04-07 07:54:22,650 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-292):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:22,650 DEBUG JpaTransactionManager:("http-bio-7006"-exec-292):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getAvaliableCountOpt]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:54:22,650 DEBUG JpaTransactionManager:("http-bio-7006"-exec-292):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@5e5d2643] for JPA transaction
2012-04-07 07:54:25,751 DEBUG AuthenticationHandler:("http-bio-7006"-exec-291):35 - Authentication handler : System~System
2012-04-07 07:54:25,752 DEBUG AuthenticationHandler:("http-bio-7006"-exec-291):45 - Service Id : 1333765465752-/xxx-framework-service/platform/skus/get_sku_details
2012-04-07 07:54:25,753 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-291):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:25,753 DEBUG JpaTransactionManager:("http-bio-7006"-exec-291):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getSkuDetails]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:54:25,754 DEBUG JpaTransactionManager:("http-bio-7006"-exec-291):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@28c7c70b] for JPA transaction
2012-04-07 07:54:30,811 DEBUG AuthenticationHandler:("http-bio-7006"-exec-234):35 - Authentication handler : System~System
2012-04-07 07:54:30,812 DEBUG AuthenticationHandler:("http-bio-7006"-exec-234):45 - Service Id : 1333765470812-/xxx-framework-service/platform/skus/get_avaliable_count
2012-04-07 07:54:30,813 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-234):242 - Returning cached instance of singleton bean 'transactionManager'
2012-04-07 07:54:30,814 DEBUG JpaTransactionManager:("http-bio-7006"-exec-234):365 - Creating new transaction with name [com.xxx.framework.manager.SkuServiceManager.getAvaliableCountOpt]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT; ''
2012-04-07 07:54:30,814 DEBUG JpaTransactionManager:("http-bio-7006"-exec-234):323 - Opened new EntityManager [org.hibernate.ejb.EntityManagerImpl@140768d6] for JPA transaction
2012-04-07 07:54:33,790 DEBUG AuthenticationHandler:("http-bio-7006"-exec-301):35 - Authentication handler : System~System
2012-04-07 07:54:33,790 DEBUG AuthenticationHandler:("http-bio-7006"-exec-301):45 - Service Id : 1333765473790-/xxx-framework-service/platform/skus/get_sku_details
2012-04-07 07:54:33,792 DEBUG DefaultListableBeanFactory:("http-bio-7006"-exec-301):242 - Returning cached instance of singleton bean 'transactionManager'

and this creation of transaction keeps on happening until server reaches 200 threads(takes 1-2 minutes) and become unresponsive. I checked MySQL connections during this time and count was much less than max connection limit.

We took thread dump also but I am not sure what exactly in that could point to such pile up or threads.

Any ideas to further debug this issue are most welcome.

回答1:

Ok, so after analyzing thread dump we realized that somehow many threads were waiting on awaitAvailable() method of c3p0 library which manages DB connections in our application. We found a discussion where somebody suggested to upgrade c3p0 library(from 0.9.1.2 to 0.9.2) because old library had this issue when lot of concurrent connections are checked out of pool, the connection counter value remains bloated after check-ins and hence doesn't allow to check out new connections and results in processes to wait for connections perpetually (we had checkout timeout = 0).

But even after upgrading the library we faced this issue so we used debugUnreturnedConnectionStackTraces property to see what's happening to connections and why are they unavailable. After a day of logs we realized that our code which was leaking DB connections managed by c3p0. There were few flows in our code where we were not calling rollback() specifically in catch block before closing entity Manager in finally block, so in case of exceptions connection was not coming back to pool and if frequency of exception is high(more than size of pool within timeout interval) then all other process threads would pile up to get connection.

Since that code is fixed, we haven't seen any such thread pileup problems.



回答2:

Thread dump analysis is the way forward here. The answers to this question should help you find the root cause: Thread Dump Analysis Tool / Method