I have to run a legacy Zope2 website and have some grievance with it. The biggest issue is that, occasionally, it just locks up, running at 100% CPU load and not answering to requests anymore. While the problem isn't reproducible on a regular basis, one page containing 3 dynamic graphs triggers it sometimes, so I suspect some kind of race condition that leads to an endless loop or a stuck busywait.
The problem is, I have not yet found a way to debug this thing. There's nothing in the Zope logs and nothing in the system logs. I tried the suggestions from this question to get a stacktrace, but the only signal that has any effect is SIGKILL
.
Is there another possibility to find out where exactly the process is when it gets stuck?
See my answer to this SO question, use Products.signalstack. It registers the same handler as the answer you already found, at Product registration time. Perhaps it works better for you.
If not, you probably have a OS-level I/O problem on your hands, and your only hope is attaching gdb to the process. Search Stack Overflow for gdb answers; there is a wealth of information here!
after running around the internet in circles for a while I finally ended up here: http://podoliaka.org/2016/04/10/debugging-cpython-gdb/ - describes in detail how all the pieces fit together. the money quote for me was 'gdb /usr/bin/python -p $PID' - the name of the executable is required in order for gdb to find the correct debug info files.
If the process is stuck in a way that no other signal gets through, you might want to consider running it from a debugger, instead of trying to attach to it at runtime.
Also, it might be useful to other debugging tactics, like turning off certain parts of the code to find out the minimal case in which it is still reproducible in order to see what causes it better.
You can print out a nice stack trace using pyrasite.
First, you'll need to have gdb installed.
Then, install pyrasite.
Use
ps
or some other method to find the process ID for the stuck python process and runpyrasite-shell
with it.You should now see a python REPL. Run the following in the REPL to see stack traces for all threads.
You could try to attach a debugger to the running process. See also this question.