I have an MPI program which compiles and runs, but I would like to step through it to make sure nothing bizarre is happening. Ideally, I would like a simple way to attach GDB to any particular process, but I'm not really sure whether that's possible or how to do it. An alternative would be having each process write debug output to a separate log file, but this doesn't really give the same freedom as a debugger.
Are there better approaches? How do you debug MPI programs?
I use this little homebrewn method to attach debugger to MPI processes - call the following function, DebugWait(), right after MPI_Init() in your code. Now while the processes are waiting for keyboard input, you have all the time to attach the debugger to them and add breakpoints. When you are done, provide a single character input and you are ready to go.
Of course you would want to compile this function for debug builds only.
If you are a
tmux
user you will feel very comfortable using the script of Benedikt Morbach:tmpi
Original source:
https://github.com/moben/scripts/blob/master/tmpiFork: https://github.com/Azrael3000/tmpi
With it you have multiple panels (number of processes) all synchronized (every command is copied on all panels or processes at the same time so you save lot of time comparing with the
xterm -e
approach). Moreover you can know the variables' values in the process you want just doing aprint
without having to move to another panel, this will print on each panel the values of the variable for each process.If you are not a
tmux
user I recommend strongly to try it and see.Another solution is to run your code within SMPI, the simulated MPI. That's an open source project in which I'm involved. Every MPI rank will be converted into threads of the same UNIX process. You can then easily use gdb to step the MPI ranks.
SMPI proposes other advantages to the study of MPI applications: clairevoyance (you can observe every parts of the system), reproducibility (several runs lead to the exact same behavior unless you specify so), absence of heisenbugs (as the simulated platform is kept different from the host one), etc.
For more information, see this presentation, or that related answer.
mpirun -gdb
Thanks to http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/CommonDoc/mpich2_gdb.html (archive link)