I'm currently using gfortran 4.9.2 and I was wondering if the compiler actually know hows to take advantage of the DO CONCURRENT construct (Fortran 2008). I know that the compiler "supports" it, but it is not clear what that entails. For example, if automatic parallelization is turned on (with some number of threads specified), does the compiler know how to parallelize a do concurrent loop?
Edit: As mentioned in the comment, this previous question on SO is very similar to mine, but it is from 2012, and only very recent versions of gfortran have implemented the newest features of modern Fortran, so I thought it was worth asking about the current state of the compiler in 2015.
Rather than explicitly enabling some new functionality, DO CONCURRENT
in gfortran seems to put restrictions on the programmer in order to implicitly allow parallelization of the loop when required (using the option -ftree-parallelize-loops=NPROC
).
While a DO
loop can contain any function call, the content of DO CONCURRENT
is restricted to PURE
functions (i.e., having no side effects). So when one attempts to use, e.g., RANDOM_NUMBER
(which is not PURE
as it needs to maintain the state of the generator) in DO CONCURRENT
, gfortran will protest:
prog.f90:25:29:
25 | call random_number(x)
| 1
Error: Subroutine call to intrinsic ‘random_number’ in DO CONCURRENT block at (1) is not PURE
Otherwise, DO CONCURRENT
behaves as normal DO
. It only enforces use of parallelizable code, so that -ftree-parallelize-loops=NPROC
succeeds. For instance, with gfortran 9.1 and -fopenmp -Ofast -ftree-parallelize-loops=4
, both the standard DO
and the F08 DO CONCURRENT
loops in the following program run in 4 threads and with virtually identical timing:
program test_do
use omp_lib, only: omp_get_wtime
integer, parameter :: n = 1000000, m = 10000
real, allocatable :: q(:)
integer :: i
real :: x, t0
allocate(q(n))
t0 = omp_get_wtime()
do i = 1, n
q(i) = i
do j = 1, m
q(i) = 0.5 * (q(i) + i / q(i))
end do
end do
print *, omp_get_wtime() - t0
t0 = omp_get_wtime()
do concurrent (i = 1:n)
q(i) = i
do j = 1, m
q(i) = 0.5 * (q(i) + i / q(i))
end do
end do
print *, omp_get_wtime() - t0
end program test_do