Does gfortran take advantage of DO CONCURRENT?

2020-08-25 06:03发布

问题:

I'm currently using gfortran 4.9.2 and I was wondering if the compiler actually know hows to take advantage of the DO CONCURRENT construct (Fortran 2008). I know that the compiler "supports" it, but it is not clear what that entails. For example, if automatic parallelization is turned on (with some number of threads specified), does the compiler know how to parallelize a do concurrent loop?

Edit: As mentioned in the comment, this previous question on SO is very similar to mine, but it is from 2012, and only very recent versions of gfortran have implemented the newest features of modern Fortran, so I thought it was worth asking about the current state of the compiler in 2015.

回答1:

Rather than explicitly enabling some new functionality, DO CONCURRENT in gfortran seems to put restrictions on the programmer in order to implicitly allow parallelization of the loop when required (using the option -ftree-parallelize-loops=NPROC).

While a DO loop can contain any function call, the content of DO CONCURRENT is restricted to PURE functions (i.e., having no side effects). So when one attempts to use, e.g., RANDOM_NUMBER (which is not PURE as it needs to maintain the state of the generator) in DO CONCURRENT, gfortran will protest:

prog.f90:25:29:

   25 |         call random_number(x)
      |                             1
Error: Subroutine call to intrinsic ‘random_number’ in DO CONCURRENT block at (1) is not PURE

Otherwise, DO CONCURRENT behaves as normal DO. It only enforces use of parallelizable code, so that -ftree-parallelize-loops=NPROC succeeds. For instance, with gfortran 9.1 and -fopenmp -Ofast -ftree-parallelize-loops=4, both the standard DO and the F08 DO CONCURRENT loops in the following program run in 4 threads and with virtually identical timing:

program test_do

    use omp_lib, only: omp_get_wtime

    integer, parameter :: n = 1000000, m = 10000
    real,  allocatable :: q(:)

    integer :: i
    real    :: x, t0

    allocate(q(n))

    t0 = omp_get_wtime()
    do i = 1, n
        q(i) = i
        do j = 1, m
            q(i) = 0.5 * (q(i) + i / q(i))
        end do
    end do
    print *, omp_get_wtime() - t0

    t0 = omp_get_wtime()
    do concurrent (i = 1:n)
        q(i) = i
        do j = 1, m
            q(i) = 0.5 * (q(i) + i / q(i))
        end do
    end do
    print *, omp_get_wtime() - t0

end program test_do