Apparently, mpirun
uses a SIGINT handler which "forwards" the SIGINT signal to each of the processes it spawned.
This means you can write an interrupt handler for your mpi-enabled code, execute mpirun -np 3 my-mpi-enabled-executable
and then SIGINT will be raised for each of the three processes. Shortly after that, mpirun exits. This works fine when you have a small custom handler which only prints an error message and then exits. However, when your custom interrupt handler is doing a non-trivial job (e.g. doing serious computations or persisting data), the handler does not run to completion. I'm assuming this is because mpirun decided to exit too soon.
Here's the stderr upon pressing ctrl-c
(i.e. causing SIGINT) after executing my-mpi-enabled-executable
. This is the desirable expected behavior:
interrupted by signal 2.
running viterbi... done.
persisting parameters... done.
the master process will now exit.
Here's the stderr upon pressing ctrl-c
after executing mpirun -np 1 my-mpi-enabled-executable
. This is the problematic behavior:
interrupted by signal 2.
running viterbi... mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8970 on node pharaoh exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
Answering any of the following questions will solve my problem:
- How to override the mpirun SIGINT handler (if at all possible)?
- How to avoid the termination of the processes mpirun spawned right after mpirun terminates?
- Is there another signal which mpirun may be sending to the children processes before mpirun terminates?
- Is there a way to "capture" the so-called "signal 0 (Unknown signal 0)" (see the second stderr above)?
I'm running openmpi-1.6.3 on linux.