Gracefully exit with MPI

2019-04-02 02:08发布

问题:

I'm trying to gracefully exit my program after if Rdinput returns an error.

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define MASTER 0
#define Abort(x) MPI_Abort(MPI_COMM_WORLD, x)
#define Bcast(send_data, count, type) MPI_Bcast(send_data, count, type, MASTER, GROUP) //root --> MASTER
#define Finalize() MPI_Finalize()

int main(int argc, char **argv){

  //Code

  if( rank == MASTER ) {
    time (&start);
    printf("Initialized at %s\n", ctime (&start) );      
    //Read file
    error = RdInput();
  }

  Bcast(&error, 1, INT); Wait();

  if( error = 1 ) MPI_Abort(1);

  //Code

  Finalize();
}

Program output:

mpirun -np 2 code.x 
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Initialized at Wed May 30 11:34:46 2012
Error [RdInput]: The file "input.mga" is not available!
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 7369 on
node einstein exiting improperly. There are two reasons this could occur:

//More error message.

What can I do to gracefully exit an MPI program without printing this huge error message?

回答1:

If you have this logic in your code:

Bcast(&error, 1, INT);
if( error = 1 ) MPI_Abort(1); 

then you're just about done (although you don't need any kind of wait after a broadcast). The trick, as you've discovered, is that MPI_Abort() does not do "graceful"; it basically is there to shut things down in whatever way possible when something's gone horribly wrong.

In this case, since now everyone agrees on the error code after the broadcast, just do a graceful end of your program:

   MPI_Bcast(&error, 1, MPI_INT, MASTER, MPI_COMM_WORLD);
   if (error != 0) {
       if (rank == 0) {
           fprintf(stderr, "Error: Program terminated with error code %d\n", error);
       }
       MPI_Finalize();
       exit(error);
   } 

It's an error to call MPI_Finalize() and keep on going with more MPI stuff, but that's not what you're doing here, so you're fine.



标签: c mpi abort