当编译gcc -O3
,为什么下面的循环不是矢量化(自动):
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
int foo () {
int i, j;
for (i=0; i<SIZE; i++){
for (j=i; j<SIZE; j++) {
a[i] = b[i] > c[j] ? b[i] : c[j];
}
}
return a[0];
}
当以下一个呢?
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
int foov () {
int i, j;
for (i=0; i<SIZE; i++){
for (j=i; j<SIZE; j++) {
a[i] += b[i] > c[j] ? b[i] : c[j];
}
}
return a[0];
}
唯一的区别是在内部循环的表达式的结果是否被分配到一个[I]中,或添加到[i]中 。
为了参考-ftree-vectorizer-verbose=6
给出了第一(非向量化)环以下的输出。
v.c:8: note: not vectorized: inner-loop count not invariant.
v.c:9: note: Unknown alignment for access: c
v.c:9: note: Alignment of access forced using peeling.
v.c:9: note: not vectorized: live stmt not supported: D.2700_5 = c[j_20];
v.c:5: note: vectorized 0 loops in function.
而对于向量化循环相同的输出是:
v.c:8: note: not vectorized: inner-loop count not invariant.
v.c:9: note: Unknown alignment for access: c
v.c:9: note: Alignment of access forced using peeling.
v.c:9: note: vect_model_load_cost: aligned.
v.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
v.c:9: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1 .
v.c:9: note: vect_model_reduction_cost: inside_cost = 1, outside_cost = 6 .
v.c:9: note: cost model: prologue peel iters set to vf/2.
v.c:9: note: cost model: epilogue peel iters set to vf/2 because peeling for alignment is unknown .
v.c:9: note: Cost model analysis:
Vector inside of loop cost: 3
Vector outside of loop cost: 27
Scalar iteration cost: 3
Scalar outside cost: 7
prologue iterations: 2
epilogue iterations: 2
Calculated minimum iters for profitability: 8
v.c:9: note: Profitability threshold = 7
v.c:9: note: Profitability threshold is 7 loop iterations.
v.c:9: note: LOOP VECTORIZED.
v.c:5: note: vectorized 1 loops in function.