I have some Fortran code that I'm parallelizing with MPI which is doing truly bizarre things. First, there's a variable nstartg that I broadcast from the boss process to all the workers:
call mpi_bcast(nstartg,1,mpi_integer,0,mpi_comm_world,ierr)
The variable nstartg
is never altered again in the program. Later on, I have the boss process send eproc
elements of an array edge
to the workers:
if (me==0) then
do n=1,ntasks-1
(determine the starting point estart and the number eproc
of values to send)
call mpi_send(edge(estart),eproc,mpi_integer,n,n,mpi_comm_world,ierr)
enddo
endif
with a matching receive statement if me
is non-zero. (I've left out some other code for readability; there's a good reason I'm not using scatterv.)
Here's where things get weird: the variable nstartg
gets altered to n
instead of keeping its actual value. For example, on process 1, after the mpi_recv, nstartg = 1
, and on process 2 it's equal to 2, and so forth. Moreover, if I change the code above to
call mpi_send(edge(estart),eproc,mpi_integer,n,n+1234567,mpi_comm_world,ierr)
and change the tag accordingly in the matching call to mpi_recv, then on process 1, nstartg = 1234568; on process 2, nstartg = 1234569, etc.
What on earth is going on? All I've changed is the tag that mpi_send/recv are using to identify the message; provided the tags are unique so that the messages don't get mixed up, this shouldn't change anything, and yet it's altering a totally unrelated variable.
On the boss process, nstartg
is unaltered, so I can fix this by broadcasting it again, but that's hardly a real solution. Finally, I should mention that compiling and running this code using electric fence hasn't picked up any buffer overflows, nor did -fbounds-check throw anything at me.
The most probable cause is that you pass an
INTEGER
scalar as the actualstatus
argument toMPI_RECV
when it should be really declared as an array with an implementation-specific size, available as theMPI_STATUS_SIZE
constant:or
The message tag is written to one of the status fields by the receive operation (its implementation-specific index is available as the
MPI_TAG
constant and the field value can be accessed asstatus(MPI_TAG)
) and if yourstatus
is simply a scalarINTEGER
, then several other local variables would get overwritten. In your case it simply happens so thatnstartg
falls just abovestatus
in the stack.If you do not care about the receive status, you can pass the special constant
MPI_STATUS_IGNORE
instead.