Compiler: gfortran-4.8.5
MPI library: OpenMPI-1.7.2 (preinstalled OpenSuSE 13.2)
This program:
use mpi
implicit none
real*16 :: x
integer :: ierr, irank, type16
call MPI_Init(ierr)
call MPI_Comm_Rank(MPI_Comm_World, irank, ierr)
if (irank+1==1) x = 2.1
if (irank+1==8) x = 2.8
if (irank+1==7) x = 5.2
if (irank+1==4) x = 6.7
if (irank+1==6) x = 6.5
if (irank+1==3) x = 5.7
if (irank+1==2) x = 4.0
if (irank+1==5) x = 6.8
print '(a,i0,a,f3.1)', "rank+1: ",irank+1," x: ",x
call MPI_AllReduce(MPI_IN_PLACE, x, 1, MPI_REAL16, MPI_MAX, MPI_Comm_World, ierr)
if (irank==0) print '(i0,a,f3.1)', irank+1," max x: ", x
call MPI_Finalize(ierr)
end
I also tried real(16)
, real(kind(1.q0))
. real(real128)
is actually equivalent with real*10
for this compiler.
The result is:
> mpif90 reduce16.f90
> mpirun -n 8 ./a.out
rank+1: 1 x: 2.1
rank+1: 2 x: 4.0
rank+1: 3 x: 5.7
rank+1: 4 x: 6.7
rank+1: 5 x: 6.8
rank+1: 6 x: 6.5
rank+1: 7 x: 5.2
rank+1: 8 x: 2.8
1 max x: 2.8
The program finds the true maximum for real*10
keeping MPI_REAL16
. The MPI specification (3.1, pages 628 and 674) is not very clear if MPI_REAL16
corresponds to real*16
or real(real128)
if these differ.
Also, assuming MPI_REAL16
is actually real(real128)
and trying to use that in a program leads to a different problem:
Error: There is no specific subroutine for the generic 'mpi_recv' at (1)
Error: There is no specific subroutine for the generic 'mpi_send' at (1)
which does not happen for real*16
.
(disregarding that one should be able to pass any bit pattern, so this check is superfluous)
What is the right way to use 16 byte reals? Is the OpenMPI library in error?
While this should just work correctly in every MPI implementation, a straightforward workaround is to implement a user-defined reduction for this type that is written in Fortran, so there are no issues with implementing it in C (this is how MPICH and OpenMPI try to do everything, hence there are issues when C cannot reproduce the behavior of Fortran).
Below is an attempt to implement this. This is the user-defined reduction in Fortran. I am certain that experienced modern Fortran programmers can do it better.
This program returns without error with Intel 16 Fortran compiler and MPICH 3.2+. Apparently I am not using I/O correctly though, so my confidence in the correctness of this program is not as high as it would be if I could write all the results to stdout.