MPI run error “caused collective abort of all rank

2019-07-20 03:09发布

问题:

I'm trying to write parallel progra with using MPI in C. However, when I run my program I get that message and my program is terminated. I do not know the reason of that error message

WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only.

Solution is starting

rank 7 in job 1 server_name_60409 caused collective abort of all ranks exit status of rank 7: return code 0

rank 6 in job 1 server_name_60409 caused collective abort of all ranks exit status of rank 6: return code 0

rank 4 in job 1 server_name_60409 caused collective abort of all ranks exit status of rank 4: killed by signal 9

rank 3 in job 1 server_name_60409 caused collective abort of all ranks exit status of rank 3: killed by signal 9

rank 2 in job 1 server_name_60409 caused collective abort of all ranks exit status of rank 2: return code 0

rank 0 in job 1 server_name_60409 caused collective abort of all ranks exit status of rank 0: return code 0

回答1:

If you missed MPI_Finalize() after using MPI, it will also generate following error:

rank 3 in job 98 n01_44763 caused collective abort of all ranks
exit status of rank 3: return code 0



回答2:

My program was aborting with a similar communicate:

rank 3 in job 58409  vnode-01_39157   caused collective abort of all ranks
  exit status of rank 3: killed by signal 9 
rank 1 in job 58409  vnode-01_39157   caused collective abort of all ranks
  exit status of rank 1: killed by signal 11 

Due to too much stack memory being allocated.
Switching to heap helped.



标签: c mpi