Consider the following code:
void foo(float* __restrict__ a)
{
int i; float val;
for (i = 0; i < 100; i++) {
val = 2 * i;
a[i] = val;
}
}
void bar(float* __restrict__ a)
{
int i; float val = 0.0;
for (i = 0; i < 100; i++) {
a[i] = val;
val += 2.0;
}
}
They're based on Examples 7.26a and 7.26b in Agner Fog's Optimizing software in C++ and should do the same thing; bar
is more "efficient" as written in the sense that we don't do an integer-to-float conversion at every iteration, but rather a float addition which is cheaper (on x86_64).
Here are the clang and gcc results on these two functions (with no vectorization and unrolling).
Question: It seems to me that the optimization of replacing a multiplication by the loop index with an addition of a constant value - when this is beneficial - should be carried out by compilers, even if (or perhaps especially if) there's a type conversion involved. Why is this not happening for these two functions?
Note that if we use int's rather than float's:
void foo(int* __restrict__ a)
{
int i; int val = 0;
for (i = 0; i < 100; i++) {
val = 2 * i;
a[i] = val;
}
}
void bar(int* __restrict__ a)
{
int i; int val = 0;
for (i = 0; i < 100; i++) {
a[i] = val;
val += 2;
}
}
Both clang and gcc perform the expected optimization, albeit not quite in the same way (see this question).