Backgroup: rank 0 send message to rank 1, after rank 1 completes its work it returns messages to rank 0
actually I run a thread for sending message and the other one for receiving in rank 0 like this:
int tag = 1;
void* thread_send(void* argc)
{
...;
while(1)
{
if(tag == 1)
{
MPI_Send(...,1,TAG_SEND,...);//send something to slave
tag = 0;
}
}
...
}
void* thread_receive(void* argc)
{
while(1)
{
MPI_Recv(...,0,TAG_RECV,...); //ready for receiving from slave
tag = 1;
}
}
in rank 1 I run a thread like this:
void* slave(void* argc)
{
...;
while(1)
{
MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
switch(status.MPI_TAG){
case TAG_SEND:
MPI_Recv(..,0,TAG_SEND,..);
break;
}
MPI_Send(...,0,MPI_RECV,...); //notify rank 0 slave has done his work
}
}
then I got an error like this:
[comp01-mpi.gpu01.cis.k.hosei.ac.jp][[54135,1],0]
[btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
received unexpected process identifier [[16641,0],301989888]
In fact there are several interfaces for one machine, I know it might to be a problem, so I assign the parameter --mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0 to avoid network traffic.
Have I done something wrong? I will appreciate any suggestion you give me, thanks.
Thanks to @HristoIliev, I checked the Open MPI like this:
MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provide_level);
if(provide_level < MPI_THREAD_MULTIPLE){
printf("Error: the MPI library doesn't provide the required thread level\n");
MPI_Abort(MPI_COMM_WORLD,0);
}
and I got the error:
Error: the MPI library doesn't provide the required thread level
that means I CAN NOT use multiple threads, so what else can I do?
Now I am using the non-blocing sends(Isend) and receives(Irecv), the code is like this: send thread:
int tag = 1;
void* thread_send(void* argc)
{
...;
while(1)
{
while(1)
{
MPI_Irecv(&tag,MPI_INT,1,MSG_TAG,MPI_COMM_WORLD,&request);
if(tag == 1) break;
printf("tag is %d\n",tag);
MPI_Wait(&request,&status);
}
MPI_Send(...,1,MSG_SEND,...);//send something to slave
tag = 0;
}
...
}
receive thread:
void* slave(void* argc)
{
...;
while(1)
{
MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
switch(status.MPI_TAG){
case TAG_SEND:
MPI_Recv(..,0,MSG_Send,..);
break;
}
int tag = 1;
MPI_Isend(&tag,1,MPI_INT,0,MSG_TAG,MPI_COMM_WORLD,&request); //notify rank 0 slave has done his work
MPI_Wait(&request,&status);
printf("slave is idle now \n");
}
}
and it printed like this:
tag is 0
slave is idle now
and hang here
I have solved the problem by changing the Irecv() funciton's location, like following:
send thread:
In conclusion, to send and receive messages at the same time, you can use multiple thread if your MPI supports multiple-thread mode, you can check it when you init your MPI program like this:
Or if your MPI doesn't support multiple thread mode, you may use non-blocking communication.