I'm running a very simple web server using Django on Gunicorn with Gevent workers which communicate with MySQL for simple crud type operations. All of this is behind nginx and hosted on AWS. I'm running my app server using the following config:
gunicorn --logger-class=simple --timeout 30 -b :3000 -w 5 -k gevent my_app.wsgi:application
However, sometimes, the workers just get stuck (sometimes when # of requests increase. Sometimes even without it)and the TPS drops with nginx returning 499 HTTP error code. Sometimes, workers have started getting killed (WORKER TIMEOUT
) and the requests are dropped.
I'm unable to find a way to debug where the workers are getting stuck. I've checked the slow logs of MySQL, and that is not the problem here.
In Java, I can take jstack
to see the threads state or some other mechanisms like takipi
which provides with the then state of threads when an exception comes.
To all the people out there who can help, I call upon you to help me find a way to see the internal state of a hosted python web server i.e.
- workers state at a given point
- threads state at a given point
- which all requests a particular gevent worker have started processing and when it gets stuck/killed, where is it actually stuck
- which all requests got terminated because of a worker getting killed
- etc
I've been looking for it and have found many people facing similar issues, but their solutions seem hit-and-trial and nowhere is the steps mentioned on how to deep down on this.