I used MPI_Isend
to transfer an array of chars to slave node. When the size of the array is small it worked, but when I enlarge the size of the array, it hanged there.
Code running on the master node (rank 0) :
MPI_Send(&text_length,1,MPI_INT,dest,MSG_TEXT_LENGTH,MPI_COMM_WORLD);
MPI_Isend(text->chars, 360358,MPI_CHAR,dest,MSG_SEND_STRING,MPI_COMM_WORLD,&request);
MPI_Wait(&request,&status);
Code running on slave node (rank 1):
MPI_Recv(&count,1,MPI_INT,0,MSG_TEXT_LENGTH,MPI_COMM_WORLD,&status);
MPI_Irecv(host_read_string,count,MPI_CHAR,0,MSG_SEND_STRING,MPI_COMM_WORLD,&request);
MPI_Wait(&request,&status);
You see the count param in MPI_Isend
is 360358
. It seemed too large for MPI
. When I set the param 1024
, it worked well.
Actually this problem has confused me a few days, I have known that there's limit on the size of data transferred by MPI
. But as far as I know, the MPI_Send
is used to send short messages, and the MPI_Isend
can send larger messages. So I use MPI_Isend
.
The network configure in rank 0 is:
[12t2007@comp01-mpi.gpu01.cis.k.hosei.ac.jp ~]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:1B:21:D9:79:A5
inet addr:192.168.0.101 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:393267 errors:0 dropped:0 overruns:0 frame:0
TX packets:396421 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:35556328 (33.9 MiB) TX bytes:79580008 (75.8 MiB)
eth0.2002 Link encap:Ethernet HWaddr 00:1B:21:D9:79:A5
inet addr:10.111.2.36 Bcast:10.111.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:133577 errors:0 dropped:0 overruns:0 frame:0
TX packets:127677 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:14182652 (13.5 MiB) TX bytes:17504189 (16.6 MiB)
eth1 Link encap:Ethernet HWaddr 00:1B:21:D9:79:A4
inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:206981 errors:0 dropped:0 overruns:0 frame:0
TX packets:303185 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:168952610 (161.1 MiB) TX bytes:271792020 (259.2 MiB)
eth2 Link encap:Ethernet HWaddr 00:25:90:91:6B:56
inet addr:10.111.1.36 Bcast:10.111.1.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:26459977 errors:0 dropped:0 overruns:0 frame:0
TX packets:15700862 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12533940345 (11.6 GiB) TX bytes:2078001873 (1.9 GiB)
Memory:fb120000-fb140000
eth3 Link encap:Ethernet HWaddr 00:25:90:91:6B:57
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:fb100000-fb120000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1894012 errors:0 dropped:0 overruns:0 frame:0
TX packets:1894012 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:154962344 (147.7 MiB) TX bytes:154962344 (147.7 MiB)
The network configure in rank 1 is:
[12t2007@comp02-mpi.gpu01.cis.k.hosei.ac.jp ~]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:1B:21:D9:79:5F
inet addr:192.168.0.102 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:328449 errors:0 dropped:0 overruns:0 frame:0
TX packets:278631 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:47679329 (45.4 MiB) TX bytes:39326294 (37.5 MiB)
eth0.2002 Link encap:Ethernet HWaddr 00:1B:21:D9:79:5F
inet addr:10.111.2.37 Bcast:10.111.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:94126 errors:0 dropped:0 overruns:0 frame:0
TX packets:53782 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:8313498 (7.9 MiB) TX bytes:6929260 (6.6 MiB)
eth1 Link encap:Ethernet HWaddr 00:1B:21:D9:79:5E
inet addr:192.168.1.102 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:121527 errors:0 dropped:0 overruns:0 frame:0
TX packets:41865 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:158117588 (150.7 MiB) TX bytes:5084830 (4.8 MiB)
eth2 Link encap:Ethernet HWaddr 00:25:90:91:6B:50
inet addr:10.111.1.37 Bcast:10.111.1.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:26337628 errors:0 dropped:0 overruns:0 frame:0
TX packets:15500750 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12526923258 (11.6 GiB) TX bytes:2032767897 (1.8 GiB)
Memory:fb120000-fb140000
eth3 Link encap:Ethernet HWaddr 00:25:90:91:6B:51
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:fb100000-fb120000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1895944 errors:0 dropped:0 overruns:0 frame:0
TX packets:1895944 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:154969511 (147.7 MiB) TX bytes:154969511 (147.7 MiB)