Loop vectorization gives different answer

I am building some unit tests and find that my code gives a slightly different result when vectorized. In my example case below, an array a is summed in one dimension and added to an initial value x. Most elements of a are too small to change x. The code is:

module datamod
   use ISO_FORTRAN_ENV, only : dp => REAL64
   implicit none

   ! -- Array dimensions are large enough for gfortran to vectorize
   integer, parameter :: N = 6
   integer, parameter :: M = 10
   real(dp) :: x(N), a(N,M)

contains
subroutine init_ax
   ! -- Set a and x so the issue manifests

   x = 0.
   x(1) =  0.1e+03_dp

   a = 0.
   ! -- Each negative component is too small to individually change x(1)
   ! -- But the positive component is just big enough
   a(   1,   1) =  -0.4e-14_dp
   a(   1,   2) =  -0.4e-14_dp
   a(   1,   3) =  -0.4e-14_dp
   a(   1,   4) =   0.8e-14_dp
   a(   1,   5) =  -0.4e-14_dp
end subroutine init_ax
end module datamod

program main
   use datamod, only : a, x, N, M, init_ax
   implicit none
   integer :: i, j

   call init_ax

   ! -- The loop in question
   do i=1,N
      do j=1,M
         x(i) = x(i) + a(i,j)
      enddo
   enddo

   write(*,'(a,e26.18)') 'x(1) is: ', x(1)
end program main

The code gives the following results in gfortran without and with loop vectorization. Note that ftree-vectorize is included in -O3, so the problem manifests when using -O3 also.

mach5% gfortran -O2 main.f90 && ./a.out
x(1) is:   0.100000000000000014E+03
mach5% gfortran -O2 -ftree-vectorize main.f90 && ./a.out
x(1) is:   0.999999999999999858E+02

I know that certain compiler options can change the answer, such as -fassociative-math. However, none of those are included in the standard -O3 optimization package according to the gcc optimization options page.

It seems to me as though the vectorized code is adding up all components of a first, and then adding to x. However, this is incorrect because the code as written requires each component of a to be added to x.

What is going on here? May loop vectorization change the answer in some cases? Gfortran versions 4.7 and 5.3 had the same problem, but Intel 16.0 and PGI 15.10 did not.

I copied the code you provided (to a file called test.f90) and then I compiled and ran it using version 4.8.5 of gfortran. I found that results from the -O2 and -O2 -ftree-vectorize options differ just as your results differ. However, when I simply used -O3, I found that the results matched -O2.

$ gfortran --version
GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

$ gfortran -O2 test.f90 && ./a.out
x(1) is:   0.100000000000000014E+03
$ gfortran -O2 -ftree-vectorize test.f90 && ./a.out
x(1) is:   0.999999999999999858E+02
$ gfortran -O3 test.f90 && ./a.out
x(1) is:   0.100000000000000014E+03