可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a fortran MPI code in which a compute intensive function is invoked on every element of a 2D array. I'm trying to split the tasks among the ranks. For example if there are 30 columns and 10 ranks, then each rank gets 3 columns. The following code does this split and gathers the results using allgather. But the final array doesn't have the values from all ranks.

        program allgather
    include 'mpif.h'
    !create a 2 x 30 myarray
    integer :: x=2,y=30
    integer :: numprocs,myid
    integer :: i,j,k,myelements,mycolumns,jb,je
    integer*4,dimension(:),allocatable :: displacement,recvcnt
    real :: checksum
    real,dimension(:,:),allocatable :: myarr,combinedarr
    call MPI_INIT(IERR)
    call MPI_COMM_SIZE(MPI_COMM_WORLD,NUMPROCS,IERR)
    call MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
    mycolumns = y/numprocs
    myelements = x * mycolumns
    allocate(displacement(numprocs),recvcnt(numprocs))
    jb = 1 + ( myid * mycolumns ) 
    je = ( myid + 1 ) * mycolumns
    allocate(myarr(x,mycolumns))
    allocate(combinedarr(x,y))
    myarr(:,:) =0
    do j=jb,je
      do i=1,x
       myarr(i,j) = 1
      enddo
    enddo
    !myarr(:,:)=1 
    if(mod(y,numprocs) > 0) then
     if(myid==numprocs-1) then
       jb=(myid + 1) * mycolumns + 1
       do j=jb,y 
        do i=1,x
          myarr(i,j) = 1
        enddo
      enddo 
     endif
    endif
    combinedarr(:,:) =0
    recvcnt(:)=myelements
    do k=1,numprocs
    displacement(k) = (k-1) *myelements
    enddo
    call MPI_ALLGATHERV(myarr,myelements,MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
    if(mod(y,numprocs) > 0) then
     recvcnt(:) = 0
     recvcnt(numprocs) = (x*y) - myelements * (numprocs)
     displacement(numprocs) = displacement(numprocs) + myelements
     call MPI_ALLGATHERV(myarr,recvcnt(numprocs),MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
    endif
    if (myid==0) then 
    checksum=0
     write(6,*) "mycolumns:",mycolumns,"myelements:",myelements 
    do j=1,y
      do i=1,x
       checksum = checksum + combinedarr(i,j)
      enddo
     enddo
       write(6,*) checksum 
    endif
    end

回答1:

First of all, you are using MPI_ALLGATHERV() just as MPI_ALLGATHER() and get no benefit from its ability to send different number of elements from/to each process. But that's not the error in your program. The error lies in the way it fills myarr. You allocate it as myarr(x,mycolumns) but when filling it from column jb to column je, you go past the end of the array in all processes but rank 0 since jb and je are greater than mycolumns there. Thus myarr contains ones only in rank 0 and zeroes in all other ranks. So, yes, the final array does not have the values that you expect but that's because you filled them wrong, not because of the way MPI subroutines are used.

Writing past the end of an allocatable array destroys the hidden structures that are used to manage heap allocation and usually crashes the program. In your case you are just lucky - I run your code with Open MPI and it crashed with core dumps each time.

And you are also missing a call to MPI_FINALIZE() at the end of your code.

Hint: use the Fortran 90 interface if available - replace include 'mpif.h' with use mpi

回答2:

here is the final version of the code. I have implemented the fixes suggested by "Hristo Iliev" and also fixed the part where the # or ranks does not equally divide the # of columns. Here the last rank does the computation on the leftover columns.

    program allgather
    include 'mpif.h'
    !create a 2 x 30 myarray
    integer :: x=4,y=6
    integer :: numprocs,myid
    integer :: i,j,k,myelements,mycolumns,jb,je,jbb
    integer*4,dimension(:),allocatable :: displacement,recvcnt
    real :: checksum
    real,dimension(:,:),allocatable :: myarr,combinedarr
    call MPI_INIT(IERR)
    call MPI_COMM_SIZE(MPI_COMM_WORLD,NUMPROCS,IERR)
    call MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
    mycolumns = y/numprocs
    myelements = x * mycolumns
    allocate(displacement(numprocs),recvcnt(numprocs))
    jb = 1 + ( myid * mycolumns ) 
    je = ( myid + 1 ) * mycolumns
    allocate(myarr(x,y))
    allocate(combinedarr(x,y))
    myarr(:,:) =0
    do j=jb,je
      do i=1,x
       myarr(i,j) = (j-1) * x + i
      enddo
    enddo
    if(mod(y,numprocs) > 0) then
     if(myid==numprocs-1) then
       jbb=(myid + 1) * mycolumns + 1
       do j=jbb,y 
        do i=1,x
           myarr(i,j) = (j-1) * x + i
        enddo
      enddo 
     endif
    endif
    combinedarr(:,:) =0
    recvcnt(:)=myelements
    do k=1,numprocs
    displacement(k) = (k-1) *myelements
    enddo
    call MPI_ALLGATHERV(myarr(1,jb),myelements,MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
    if(mod(y,numprocs) > 0) then
     recvcnt(:) = 0
     recvcnt(numprocs) = (x*y) - myelements * (numprocs)
     displacement(numprocs) = displacement(numprocs) + myelements
     call MPI_ALLGATHERV(myarr(1,jbb),recvcnt(numprocs),MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
    endif
    if (myid==0) then 
    checksum=0
     write(6,*) "mycolumns:",mycolumns,"myelements:",myelements 
    do j=1,y
      do i=1,x
       checksum = checksum + combinedarr(i,j)
      enddo
     enddo
       write(6,*) checksum 
    endif
    end