What should I do if I want to send message by MPI

2019-09-02 04:08发布

问题:

Backgroup: rank 0 send message to rank 1, after rank 1 completes its work it returns messages to rank 0

actually I run a thread for sending message and the other one for receiving in rank 0 like this:

int tag = 1;
void* thread_send(void* argc)
{
   ...;
    while(1)
   {
     if(tag == 1) 
     {
        MPI_Send(...,1,TAG_SEND,...);//send something to slave
        tag = 0;
     }
   }
   ...
}

void* thread_receive(void* argc)
{
    while(1)
    {
      MPI_Recv(...,0,TAG_RECV,...); //ready for receiving from slave
      tag = 1;
    }
}

in rank 1 I run a thread like this:

void* slave(void* argc)
{   
    ...;
    while(1)
    {
        MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
        switch(status.MPI_TAG){
        case TAG_SEND:
        MPI_Recv(..,0,TAG_SEND,..);
        break;
        }
        MPI_Send(...,0,MPI_RECV,...); //notify rank 0 slave has done his work
    }
}

then I got an error like this:

    [comp01-mpi.gpu01.cis.k.hosei.ac.jp][[54135,1],0]
    [btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] 
    received unexpected      process identifier [[16641,0],301989888]

In fact there are several interfaces for one machine, I know it might to be a problem, so I assign the parameter --mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0 to avoid network traffic.

Have I done something wrong? I will appreciate any suggestion you give me, thanks.

Thanks to @HristoIliev, I checked the Open MPI like this:

    MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provide_level);
    if(provide_level < MPI_THREAD_MULTIPLE){
        printf("Error: the MPI library doesn't provide the required thread level\n");
        MPI_Abort(MPI_COMM_WORLD,0);
    }

and I got the error:

Error: the MPI library doesn't provide the required thread level

that means I CAN NOT use multiple threads, so what else can I do?

Now I am using the non-blocing sends(Isend) and receives(Irecv), the code is like this: send thread:

int tag = 1;
    void* thread_send(void* argc)
{

   ...;
    while(1)
   {
     while(1)
     {
          MPI_Irecv(&tag,MPI_INT,1,MSG_TAG,MPI_COMM_WORLD,&request);
          if(tag == 1) break;
          printf("tag is %d\n",tag);
          MPI_Wait(&request,&status);
     }

        MPI_Send(...,1,MSG_SEND,...);//send something to slave
        tag = 0;

   }
   ...
}

receive thread:

void* slave(void* argc)
    {   
        ...;
        while(1)
        {
            MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
            switch(status.MPI_TAG){
            case TAG_SEND:
            MPI_Recv(..,0,MSG_Send,..);
            break;
            }
            int tag = 1;
            MPI_Isend(&tag,1,MPI_INT,0,MSG_TAG,MPI_COMM_WORLD,&request); //notify rank 0 slave has done his work
           MPI_Wait(&request,&status);
           printf("slave is idle now \n");
        }
    }

and it printed like this:

tag is 0
slave is idle now

and hang here

回答1:

I have solved the problem by changing the Irecv() funciton's location, like following:

send thread:

int tag = 1;
    void* thread_send(void* argc)
{

   ...;
    while(1)
   {
     while(1)
     {

          if(tag == 1) break;
          printf("tag is %d\n",tag);
          MPI_Irecv(&tag,MPI_INT,1,MSG_TAG,MPI_COMM_WORLD,&request);
          MPI_Wait(&request,&status);
     }

        MPI_Send(...,1,MSG_SEND,...);//send something to slave
        tag = 0;

   }
   ...
}.

In conclusion, to send and receive messages at the same time, you can use multiple thread if your MPI supports multiple-thread mode, you can check it when you init your MPI program like this:

MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provide_level);
    if(provide_level < MPI_THREAD_MULTIPLE){
        printf("Error: the MPI library doesn't provide the required thread level\n");
        MPI_Abort(MPI_COMM_WORLD,0);
    }

Or if your MPI doesn't support multiple thread mode, you may use non-blocking communication.



标签: tcp ip mpi openmpi