Segmentation fault during MPI_FINALIZE() in Fortra

I am getting a segmentation fault during a call to MPI_FINALIZE() in a Fortran 90 program. While the code is quite extensive, I will post the pseudocode and see if it raises any flags. I have a hunch (but have not yet tried this) that it could possibly be caused by not deallocating arrays? I'm not sure however - can failure to deallocate arrays in Fortran 90 cause segmentation faults during a call to MPI_FINALIZE?

if(<rank 0>) then
  do iat = 1,natoms
    do il = 0, LMAX
      do im = -il,il
        <mpi_recv "rank_rdy"> ! find out which rank is ready for (at,l,m)
        <mpi_send "(iat,il,im)"> ! send (at,l,m) to the rank asking for it
      enddo
    enddo
  enddo
else ! other ranks send a 'ready' signal and recieve the (at,l,m) to optimize
  if(<rank 0 is not finished processing (at,l,m)'s>)
    <mpi_send "my_rank"> ! tell rank 0 that i am ready to receive
    <mpi_recv "(iat,il,im)"> ! recieve (at,l,m) from rank 0
    call optimize(iat,il,im) ! do work on (at,l,m)
  endif
endif

if(<rank 0>)
  <read temp files created by other ranks>
  <write temp files to one master file>
endif

print*, 'calling finalize'

call MPI_BARRIER(MPI_COMM_WORLD, ierr)
call MPI_FINALIZE()

Now on output I get, among other information not pertaining to this problem, the following:

 calling finalize
 calling finalize
 calling finalize
 calling finalize
 calling finalize
 calling finalize

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

I get the same problem even if I do not call MPI_BARRIER but I thought that might help. Note that there are arrays used in every rank that I do not bother deallocating because I use them through the entire program so I am not worried about memory leaks or anything. Is it possible that this segfault is occurring due to MPI_FINALIZE() being called without freeing up memory?

I am going to explore this more on my own, but I wanted to post this question for a few reasons:

Want to know if this is a known issue when calling MPI_FINALIZE()
Want to know why this happens (if it is actually the problem) when calling MPI_FINALIZE(). Internally, what is going on that causes this segfault?
I have searched high and low online and found nothing about this problem, so for posterity this could be a good question to have answered on the web.

Edit: I forgot to mention this, but I am not able to duplicate this problem when running it in serial. Obviously, I do not do the distribution of (at,l,m) in serial. The only process simply runs through all combinations and optimizes them one by one. I do not, however deallocate the arrays which I think might be causing the problem in MPI, and I still do not get a segfault.

One should always use the Fortran 90 MPI interface if available instead of the old FORTRAN 77 inteface. That is you should always

USE mpi

instead of

INCLUDE 'mpif.h'

The difference between the two is that the Fortran 90 interface puts all MPI subroutines in a module and thus explicit interfaces are being generated. This allows the compiler to do argument checking in calls and signal an error if you e.g. omit an argument.

In Fortran's calling convention all arguments are passed by address, irrespective of their type. This allows the compiler to generate proper calls to functions and subroutines without requiring prototypes as in C. But this also means that one can freely pass the an INTEGER argument where an array of REAL is expected and virtually all FORTRAN 77 compilers will happily compile such code or one can pass fewer/more arguments than expected. There are external tools, usually called linters by the name of the C tool lint, that parse the whole source tree and can pinpoint such errors and many others that the compiler would not care to find. One such tool that does such static code analysis for Fortran is flint. Fortran 90 added interfaces in order to compensate for this error-prone nature of Fortran.

Calling a Fortran subroutine with fewer arguments than expected can have many different ill effects depending on the architecture but in most cases will result in crash, especially if the omitted argument is an output one. The called function doesn't know that less arguments are being passed - it just looks where its address should be and takes whatever address it finds there. As ierr is an output argument, a write at that address would occur. There is a good chance that the address would not point to a virtual address that corresponds to mapped memory and a hefty segmentation fault would be delivered by the OS. Even if the address points somewhere in user's allocated memory, the result could be an overwrite of an important value in some control structure. And if even that doesn't happen, then there are calling conventions in which the callee cleans up the stack frame - in this case the stack pointer would be incorrectly incremented and the return address would be completely different from the right one, which would almost certainly lead to jump to non-executable (and even non-mapped) memory and again to segmentation fault.