Computation is optimized only if variable updated

2019-06-21 14:06发布

问题:

For the following function, the code with optimizations is vectorized and the computation is performed in registers (the return value is returned in eax). Generated machine code is, e.g., here: https://godbolt.org/z/VQEBV4.

int sum(int *arr, int n) {
  int ret = 0;
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

However, if I make ret variable global (or, a parameter of type int&), the vectorization is not used and the compiler stores the updated ret in each iteration to memory. Machine code: https://godbolt.org/z/NAmX4t.

int ret = 0;

int sum(int *arr, int n) {
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

I don't understand why the optimizations (vectorization/computations in registers) are prevented in the latter case. There is no threading, even the increments are not performed atomically. Moreover, this behavior seems to be consistent across compilers (GCC, Clang, Intel), so I believe there must be some reason for it.

回答1:

If ret is not local but global, arr might alias to ret reducing opportunity to optimize.