I have an MPI program which compiles and runs, but I would like to step through it to make sure nothing bizarre is happening. Ideally, I would like a simple way to attach GDB to any particular process, but I'm not really sure whether that's possible or how to do it. An alternative would be having each process write debug output to a separate log file, but this doesn't really give the same freedom as a debugger.
Are there better approaches? How do you debug MPI programs?
http://valgrind.org/ nuf said
More specific link: Debugging MPI Parallel Programs with Valgrind
Using
screen
together withgdb
to debug MPI applications works nicely, especially ifxterm
is unavailable or you're dealing with more than a few processors. There were many pitfalls along the way with accompanying stackoverflow searches, so I'll reproduce my solution in full.First, add code after MPI_Init to print out the PID and halt the program to wait for you to attach. The standard solution seems to be an infinite loop; I eventually settled on
raise(SIGSTOP);
, which requires an extra call ofcontinue
to escape within gdb.After compiling, run the executable in the background, and catch the stderr. You can then
grep
the stderr file for some keyword (here literal PID) to get the PID and rank of each process.A gdb session can be attached to each process with
gdb $MDRUN_EXE $PID
. Doing so within a screen session allows easy access to any gdb session.-d -m
starts the screen in detached mode,-S "P$RANK"
allows you to name the screen for easy access later, and the-l
option to bash starts it in interactive mode and keeps gdb from exiting immediately.Once gdb has started in the screens, you may script input to the screens (so that you don't have to enter every screen and type the same thing) using screen's
-X stuff
command. A newline is required at the end of the command. Here the screens are accessed by-S "P$i"
using the names previously given. The-p 0
option is critical, otherwise the command intermittently fails (based on whether or not you have previously attached to the screen).At this point you can attach to any screen using
screen -rS "P$i"
and detach usingCtrl+A+D
. Commands may be sent to all gdb sessions in analogy with the previous section of code.The "standard" way to debug MPI programs is by using a debugger which supports that execution model.
On UNIX, TotalView is said to have good suppoort for MPI.
Quite a simple way to debug an MPI program.
In main () function add sleep (some_seconds)
Run the program as usual
Program will start and get into the sleep.
So you will have some seconds to find you processes by ps, run gdb and attach to them.
If you use some editor like QtCreator you can use
Debug->Start debugging->Attach to running application
and find you processes there.
As others have mentioned, if you're only working with a handful of MPI processes you can try to use multiple gdb sessions, the redoubtable valgrind or roll your own printf / logging solution.
If you're using more processes than that, you really start needing a proper debugger. The OpenMPI FAQ recommends both Allinea DDT and TotalView.
I work on Allinea DDT. It's a full-featured, graphical source-code debugger so yes, you can:
...and so on. If you've used Eclipse or Visual Studio then you'll be right at home.
We added some interesting features specifically for debugging parallel code (be it MPI, multi-threaded or CUDA):
Scalar variables are automatically compared across all processes:
(source: allinea.com)
You can also trace and filter the values of variables and expressions over processes and time:
It's widely used amongst top500 HPC sites, such as ORNL, NCSA, LLNL, Jülich et. al.
The interface is pretty snappy; we timed stepping and merging the stacks and variables of 220,000 processes at 0.1s as part of the acceptance testing on Oak Ridge's Jaguar cluster.
@tgamblin mentioned the excellent STAT, which integrates with Allinea DDT, as do several other popular open source projects.
I have found gdb quite useful. I use it as
This the launches xterm windows in which I can do
usually works fine
You can also package these commands together using: