I am building some unit tests and find that my code gives a slightly different result when vectorized. In my example case below, an array a
is summed in one dimension and added to an initial value x
. Most elements of a
are too small to change x
. The code is:
module datamod
use ISO_FORTRAN_ENV, only : dp => REAL64
implicit none
! -- Array dimensions are large enough for gfortran to vectorize
integer, parameter :: N = 6
integer, parameter :: M = 10
real(dp) :: x(N), a(N,M)
contains
subroutine init_ax
! -- Set a and x so the issue manifests
x = 0.
x(1) = 0.1e+03_dp
a = 0.
! -- Each negative component is too small to individually change x(1)
! -- But the positive component is just big enough
a( 1, 1) = -0.4e-14_dp
a( 1, 2) = -0.4e-14_dp
a( 1, 3) = -0.4e-14_dp
a( 1, 4) = 0.8e-14_dp
a( 1, 5) = -0.4e-14_dp
end subroutine init_ax
end module datamod
program main
use datamod, only : a, x, N, M, init_ax
implicit none
integer :: i, j
call init_ax
! -- The loop in question
do i=1,N
do j=1,M
x(i) = x(i) + a(i,j)
enddo
enddo
write(*,'(a,e26.18)') 'x(1) is: ', x(1)
end program main
The code gives the following results in gfortran without and with loop vectorization. Note that ftree-vectorize
is included in -O3
, so the problem manifests when using -O3
also.
mach5% gfortran -O2 main.f90 && ./a.out
x(1) is: 0.100000000000000014E+03
mach5% gfortran -O2 -ftree-vectorize main.f90 && ./a.out
x(1) is: 0.999999999999999858E+02
I know that certain compiler options can change the answer, such as -fassociative-math
. However, none of those are included in the standard -O3
optimization package according to the gcc optimization options page.
It seems to me as though the vectorized code is adding up all components of a
first, and then adding to x. However, this is incorrect because the code as written requires each component of a
to be added to x
.
What is going on here? May loop vectorization change the answer in some cases? Gfortran versions 4.7 and 5.3 had the same problem, but Intel 16.0 and PGI 15.10 did not.