This question already has an answer here:
- Why is my MPI program outputting incorrectly 2 answers
I'm trying to create a "Hello, world!" application in (Open)MPI such that each process will print out in order.
My idea was to have the first process send a message to the second when it's finished, then the second to the third, etc.:
#include <mpi.h>
#include <stdio.h>
int main(int argc,char **argv) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// See: http://mpitutorial.com/mpi-send-and-receive/
if (rank == 0) {
// This is the first process.
// Print out immediately.
printf("Hello, World! I am rank %d of %d.\n", rank, size);
MPI_Send(&rank, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else {
// Wait until the previous one finishes.
int receivedData;
MPI_Recv(&receivedData, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Hello, World! I am rank %d of %d (message: %d).\n", rank, size, receivedData);
if (rank + 1 < size) {
// We're not the last one. Send out a message.
MPI_Send(&rank, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD);
} else {
printf("Hello world completed!\n");
}
}
MPI_Finalize();
return 0;
}
When I run this on an eight-core cluster, it runs perfectly every time. However, when I run it on a sixteen-core cluster, sometimes it works, and sometimes it outputs something like this:
Hello, world, I am rank 0 of 16.
Hello, world, I am rank 1 of 16 (message: 0).
Hello, world, I am rank 2 of 16 (message: 1).
Hello, world, I am rank 3 of 16 (message: 2).
Hello, world, I am rank 4 of 16 (message: 3).
Hello, world, I am rank 5 of 16 (message: 4).
Hello, world, I am rank 6 of 16 (message: 5).
Hello, world, I am rank 7 of 16 (message: 6).
Hello, world, I am rank 10 of 16 (message: 9).
Hello, world, I am rank 11 of 16 (message: 10).
Hello, world, I am rank 8 of 16 (message: 7).
Hello, world, I am rank 9 of 16 (message: 8).
Hello, world, I am rank 12 of 16 (message: 11).
Hello, world, I am rank 13 of 16 (message: 12).
Hello, world, I am rank 14 of 16 (message: 13).
Hello, world, I am rank 15 of 16 (message: 14).
Hello world completed!
That is, most of the output is in order, but some is out of place.
Why is this happening? How is it even possible? How can I fix it?