Intel Fortran vectorisation: vector loop cost high

2019-09-06 23:56发布

问题:

I'm testing and optimising a legacy code with Intel Fortran 15, and I have this simple loop:

do ir=1,N(lev)
  G1(lev)%D(ir) = 0.d0
  G2(lev)%D(ir) = 0.d0
enddo

where lev is equal to some integer.

Structures and indexes are quite complex for the compiler, but it can succeed in the task, as I can see on other lines. Now, on the above loop, I get this from the compilation report:

LOOP BEGIN at MLFMATranslationProd.f90(38,2)
  remark #15399: vectorization support: unroll factor set to 4
  remark #15300: LOOP WAS VECTORIZED
  remark #15462: unmasked indexed (or gather) loads: 2
  remark #15475: --- begin vector loop cost summary ---
  remark #15476: scalar loop cost: 12
  remark #15477: vector loop cost: 20.000
  remark #15478: estimated potential speedup: 2.340
  remark #15479: lightweight vector operations: 5
  remark #15481: heavy-overhead vector operations: 1
  remark #15488: --- end vector loop cost summary ---
LOOP END

My question is: how is it that the vector loop cost is higher than the scalar one? What can I do to go towards the estimated potential speedup?

回答1:

The loop cost is an estimate of the duration of one loop iteration and it takes somewhat longer in the vectorized case, but you can process more array items in one vectorized iteration.

In your case the speedup is roughly 12 / 20 * 4 = 2.4 because you can process 4 double precision array elements in one iteration (probably the AVX instructions).