I'm testing and optimising a legacy code with Intel Fortran 15, and I have this simple loop:
do ir=1,N(lev)
G1(lev)%D(ir) = 0.d0
G2(lev)%D(ir) = 0.d0
enddo
where lev
is equal to some integer.
Structures and indexes are quite complex for the compiler, but it can succeed in the task, as I can see on other lines. Now, on the above loop, I get this from the compilation report:
LOOP BEGIN at MLFMATranslationProd.f90(38,2)
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15462: unmasked indexed (or gather) loads: 2
remark #15475: --- begin vector loop cost summary ---
remark #15476: scalar loop cost: 12
remark #15477: vector loop cost: 20.000
remark #15478: estimated potential speedup: 2.340
remark #15479: lightweight vector operations: 5
remark #15481: heavy-overhead vector operations: 1
remark #15488: --- end vector loop cost summary ---
LOOP END
My question is: how is it that the vector loop cost is higher than the scalar one? What can I do to go towards the estimated potential speedup
?