mpi alters a variable it shouldn't [duplicate]

2020-01-29 21:31发布

I have some Fortran code that I'm parallelizing with MPI which is doing truly bizarre things. First, there's a variable nstartg that I broadcast from the boss process to all the workers:

call mpi_bcast(nstartg,1,mpi_integer,0,mpi_comm_world,ierr)

The variable nstartg is never altered again in the program. Later on, I have the boss process send eproc elements of an array edge to the workers:

if (me==0) then
    do n=1,ntasks-1
        (determine the starting point estart and the number eproc
         of values to send)
        call mpi_send(edge(estart),eproc,mpi_integer,n,n,mpi_comm_world,ierr)
    enddo
endif

with a matching receive statement if me is non-zero. (I've left out some other code for readability; there's a good reason I'm not using scatterv.)

Here's where things get weird: the variable nstartg gets altered to n instead of keeping its actual value. For example, on process 1, after the mpi_recv, nstartg = 1, and on process 2 it's equal to 2, and so forth. Moreover, if I change the code above to

call mpi_send(edge(estart),eproc,mpi_integer,n,n+1234567,mpi_comm_world,ierr)

and change the tag accordingly in the matching call to mpi_recv, then on process 1, nstartg = 1234568; on process 2, nstartg = 1234569, etc.

What on earth is going on? All I've changed is the tag that mpi_send/recv are using to identify the message; provided the tags are unique so that the messages don't get mixed up, this shouldn't change anything, and yet it's altering a totally unrelated variable.

On the boss process, nstartg is unaltered, so I can fix this by broadcasting it again, but that's hardly a real solution. Finally, I should mention that compiling and running this code using electric fence hasn't picked up any buffer overflows, nor did -fbounds-check throw anything at me.

1条回答
家丑人穷心不美
2楼-- · 2020-01-29 22:06

The most probable cause is that you pass an INTEGER scalar as the actual status argument to MPI_RECV when it should be really declared as an array with an implementation-specific size, available as the MPI_STATUS_SIZE constant:

INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status

or

INTEGER status(MPI_STATUS_SIZE)

The message tag is written to one of the status fields by the receive operation (its implementation-specific index is available as the MPI_TAG constant and the field value can be accessed as status(MPI_TAG)) and if your status is simply a scalar INTEGER, then several other local variables would get overwritten. In your case it simply happens so that nstartg falls just above status in the stack.

If you do not care about the receive status, you can pass the special constant MPI_STATUS_IGNORE instead.

查看更多
登录 后发表回答