I'm building a distributed web server in C/MPI and it seems like point-to-point communication completely stops working after the first MPI_BARRIER in my code. Standard C code works after the barrier, so I know that each of the threads makes it through the barrier. Point-to-point communication also works just fine before the barrier. However, when I copy-paste the same code that worked the line before the barrier into the line after the barrier it stops working entirely. The SEND will just wait forever. When I try using an ISEND instead, it makes it through the line, but the message is never received. I've been googling this problem a lot and everyone who has problems with MPI_BARRIER is told the barrier works correctly and their code is wrong, but I cannot for the life of me figure out why my code is wrong. What could be causing this behavior?
Here is a sample program that demonstrates this:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int procID;
int val;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &procID);
MPI_Barrier(MPI_COMM_WORLD);
if (procID == 0)
{
val = 4;
printf("Before send\n");
MPI_Send(&val, 1, MPI_INT, 1, 4, MPI_COMM_WORLD);
printf("after send\n");
}
if (procID == 1)
{
val = 1;
printf("before: val = %d\n", val);
MPI_Recv(&val, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("after: val = %d\n", val);
}
MPI_Finalize();
return 0;
}
Moving the two if
statements before the barrier causes this program to run correctly.
EDIT - It appears that the first communication, regardless of type, works, and all future communications fail. This is much more general that I thought at first. It doesn't matter if the first communication is a barrier or some other message, no future communications work properly.