Let's consider this trivial code:
#include <atomic>
std::atomic<int> a;
void f(){
for(int k=0;k<100;++k)
a.load(std::memory_order_relaxed);
}
MSVC, Clang and GCC all perform 100 loads of a, while it seems obvious it could have been optimized away. I expected the function f
to be a nop (See generated code here)
Actually, I expected this code generation for a volatile atomic:
volatile std::atomic<int> va;
void g(){
for(int k=0;k<100;++k)
va.load(std::memory_order_relaxed);
}
Why do compilers not optimize away unnecessary atomic loads?