MPI Fortran compiler optimization error [duplicate

This question already has an answer here:

MPI_Recv overwrites parts of memory it should not access 1 answer

Despite having written long, heavily parallelized codes with complicated send/receives over three dimensional arrays, this simple code with a two dimensional array of integers has got me at my wits end. I combed stackoverflow for possible solutions and found one that resembled slightly with the issue I am having:

Boost.MPI: What's received isn't what was sent!

However the solutions seem to point the looping segment of code as the culprit for overwriting sections of the memory. But this one seems to act even stranger. Maybe it is a careless oversight of some simple detail on my part. The problem is with the below code:

program main
implicit none

include 'mpif.h'

integer :: i, j
integer :: counter, offset
integer :: rank, ierr, stVal
integer, dimension(10, 10) :: passMat, prntMat      !! passMat CONTAINS VALUES TO BE PASSED TO prntMat

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

counter = 0
offset = (rank + 1)*300
do j = 1, 10
    do i = 1, 10
        prntMat(i, j) = 10                          !! prntMat OF BOTH RANKS CONTAIN 10
        passMat(i, j) = offset + counter            !! passMat OF rank=0 CONTAINS 300..399 AND rank=1 CONTAINS 600..699
        counter = counter + 1
    end do
end do

if (rank == 1) then
    call MPI_SEND(passMat(1:10, 1:10), 100, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)    !! SEND passMat OF rank=1 to rank=0
else
    call MPI_RECV(prntMat(1:10, 1:10), 100, MPI_INTEGER, 1, 1, MPI_COMM_WORLD, stVal, ierr)
    do i = 1, 10
        print *, prntMat(:, i)
    end do
end if

call MPI_FINALIZE(ierr)
end program main

When I compile the code with mpif90 with no flags and run it on my machine with mpirun -np 2, I get the following output with wrong values in the first four indices of the array:

0 0 400 0 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699

However, when I compile it with the same compiler but with the -O3 flag on, I get the correct output:

600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699

This error is machine dependent. This issue turns up only on my system running Ubuntu 14.04.2, using OpenMPI 1.6.5

I tried this on other systems running RedHat and CentOS and the code ran well with and without the -O3 flag. Curiously those machines use an older version of OpenMPI - 1.4

I am guessing that the -O3 flag is performing some odd optimization that is modifying the manner in which arrays are being passed between the processes.

I also tried other versions of array allocation. The above code uses explicit shape arrays. With assumed shape and allocated arrays I am receiving equally, if not more bizarre results, with some of them seg-faulting. I tried using Valgrind to trace the origin of these seg-faults, but I still haven't gotten the hang of getting Valgrind to not give false positives when running with MPI programs.

I believe that resolving the difference in performance of the above code will help me understand the tantrums of my other codes as well.

Any help would be greatly appreciated! This code has really gotten me questioning if all the other MPI codes I wrote are sound at all.

Using the Fortran 90 interface to MPI reveals a mismatch in your call to MPI_RECV

      call MPI_RECV(prntMat(1:10, 1:10), 100, MPI_INTEGER, 1, 1, MPI_COMM_WORLD, stVal, ierr)
                                                                                            1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)

This is because the status variable stVal is an integer scalar, rather than an array of MPI_STATUS_SIZE. The F77 interface (include 'mpif.h') to MPI_RECV is:

INCLUDE ’mpif.h’
MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)
<type>    BUF(*)
INTEGER    COUNT, DATATYPE, SOURCE, TAG, COMM
INTEGER    STATUS(MPI_STATUS_SIZE), IERROR

Changing

integer :: rank, ierr, stVal

integer :: rank, ierr, stVal(mpi_status_size)

produces a program that works as expected, tested with gfortran 5.1 and OpenMPI 1.8.5.

Using the F90 interface (use mpi vs include "mpif.h") lets the compiler detect the mismatched arguments at compile time rather than producing confusing runtime problems.