I am testing MPI I/O.
subroutine save_vtk
integer :: filetype, fh, unit
integer(MPI_OFFSET_KIND) :: pos
real(RP),allocatable :: buffer(:,:,:)
integer :: ie
if (master) then
open(newunit=unit,file="out.vtk", &
access='stream',status='replace',form="unformatted",action="write")
! write the header
close(unit)
end if
call MPI_Barrier(mpi_comm,ie)
call MPI_File_open(mpi_comm,"out.vtk", MPI_MODE_APPEND + MPI_MODE_WRONLY, MPI_INFO_NULL, fh, ie)
call MPI_Type_create_subarray(3, int(ng), int(nxyz), int(off), &
MPI_ORDER_FORTRAN, MPI_RP, filetype, ie)
call MPI_type_commit(filetype, ie)
call MPI_Barrier(mpi_comm,ie)
call MPI_File_get_position(fh, pos, ie)
call MPI_Barrier(mpi_comm,ie)
call MPI_File_set_view(fh, pos, MPI_RP, filetype, "native", MPI_INFO_NULL, ie)
buffer = BigEnd(Phi(1:nx,1:ny,1:nz))
call MPI_File_write_all(fh, buffer, nx*ny*nz, MPI_RP, MPI_STATUS_IGNORE, ie)
call MPI_File_close(fh, ie)
end subroutine
The undefined variables come from host association, some error checking removed. I receive this error when running it on a national academic cluster:
*** An error occurred in MPI_Isend
*** reported by process [3941400577,18036219417246826496]
*** on communicator MPI COMMUNICATOR 20 DUP FROM 0
*** MPI_ERR_BUFFER: invalid buffer pointer
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
The error is triggered by the call to MPI_File_write_all
. I am suspecting it may be connected with size of the buffer which is the full nx*ny*nz
which is in the order of 10^5
to 10^6
., but I cannot exclude a programming error on my side, as I have no prior experience with MPI I/O.
The MPI implementation used is OpenMPI 1.8.0
with the Intel Fortran 14.0.2.
Do you know how to make it work and write the file?
--- Edit2 ---
Testing a simplified version, the important code remains the same, full source is here. Notice it works with gfortran and fails with different MPI's with Intel. I wasn't able to compile it with PGI. Also I was wrong in that it fails only on different nodes, it fails even in single process run.
>module ad gcc-4.8.1
>module ad openmpi-1.8.0-gcc
>mpif90 save.f90
>./a.out
Trying to decompose in 1 1 1 process grid.
>mpirun a.out
Trying to decompose in 1 1 2 process grid.
>module rm openmpi-1.8.0-gcc
>module ad openmpi-1.8.0-intel
>mpif90 save.f90
>./a.out
Trying to decompose in 1 1 1 process grid.
ERROR write_all
MPI_ERR_IO: input/output error
>module rm openmpi-1.8.0-intel
>module ad openmpi-1.6-intel
>mpif90 save.f90
>./a.out
Trying to decompose in 1 1 1 process grid.
ERROR write_all
MPI_ERR_IO: input/output error
[luna24.fzu.cz:24260] *** An error occurred in MPI_File_set_errhandler
[luna24.fzu.cz:24260] *** on a NULL communicator
[luna24.fzu.cz:24260] *** Unknown error
[luna24.fzu.cz:24260] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly. You should
double check that everything has shut down cleanly.
Reason: After MPI_FINALIZE was invoked
Local host: luna24.fzu.cz
PID: 24260
--------------------------------------------------------------------------
>module rm openmpi-1.6-intel
>module ad mpich2-intel
>mpif90 save.f90
>./a.out
Trying to decompose in 1 1 1 process grid.
ERROR write_all
Other I/O error , error stack:
ADIOI_NFS_WRITECONTIG(70): Other I/O error Bad a
ddress