I'm using non-blocking communication in MPI to send various messages between processes. However, I appear to be getting a deadlock. I have used PADB (see here) to look at the message queues and have got the following output:
1:msg12: Operation 1 (pending_receive) status 0 (pending)
1:msg12: Rank local 4 global 4
1:msg12: Size desired 4
1:msg12: tag_wild 0
1:msg12: Tag desired 16
1:msg12: system_buffer 0
1:msg12: Buffer 0xcaad32c
1:msg12: 'Receive: 0xcac3c80'
1:msg12: 'Data: 4 * MPI_FLOAT'
--
1:msg32: Operation 0 (pending_send) status 2 (complete)
1:msg32: Rank local 4 global 4
1:msg32: Actual local 4 global 4
1:msg32: Size desired 4 actual 4
1:msg32: tag_wild 0
1:msg32: Tag desired 16 actual 16
1:msg32: system_buffer 0
1:msg32: Buffer 0xcaad32c
1:msg32: 'Send: 0xcab7c00'
1:msg32: 'Data transfer completed'
--
2:msg5: Operation 1 (pending_receive) status 0 (pending)
2:msg5: Rank local 1 global 1
2:msg5: Size desired 4
2:msg5: tag_wild 0
2:msg5: Tag desired 16
2:msg5: system_buffer 0
2:msg5: Buffer 0xabbc348
2:msg5: 'Receive: 0xabd1780'
2:msg5: 'Data: 4 * MPI_FLOAT'
--
2:msg25: Operation 0 (pending_send) status 2 (complete)
2:msg25: Rank local 1 global 1
2:msg25: Actual local 1 global 1
2:msg25: Size desired 4 actual 4
2:msg25: tag_wild 0
2:msg25: Tag desired 16 actual 16
2:msg25: system_buffer 0
2:msg25: Buffer 0xabbc348
2:msg25: 'Send: 0xabc5700'
2:msg25: 'Data transfer completed'
This seems to have showed that sends have completed, but all of the receives are pending (the above is just an small part of the log for a tag value of 16). However, how can this happen? Surely sends can't complete without the associated receive completing, as in MPI all sends and receives have to match. At least that's what I thought...
Can anyone provide any insights?
I can provide the code I'm using to do this, but surely Isend and Irecv should work regardless of what order they are all called in, assuming that MPI_Waitall is called right at the end.
Update: Code is available at this gist
Update: I've made various modifications to the code, but it still isn't working quite properly. The new code is at the same gist, and the output I'm getting is at this gist. I have a number of questions/issues with this code:
Why is the output from the final loop (printing all of the arrays) interspersed with the rest of the output when I have a MPI_Barrier() before it to make sure all of the work has been done before printing it out?
It is possible/sensible to be sending from rank 0 to rank 0 - will that work ok? (assuming a correct matching receive is posted, of course).
I'm getting lots of very strange long numbers in the output, which I assume is some kinda of memory-overwriting problem, or sizes of variables problem. The interesting thing is that this must be resulting from the MPI communications, because I initialise new_array to a value of 9999.99 and the communication obviously causes it to be changed to these strange values. Any ideas why?
Overall it seems that some of the transposition is occurring (bits of the matrix seem to be transposed...), but definitely not all of it - it's these strange numbers that are coming up that are worrying me the most!