This question already has an answer here:
-
Why don't compilers merge redundant std::atomic writes?
9 answers
Will the two loads be combined to one in such scenarios?
If this is architecture dependent, what would be the case in say modern processors from say Intel? I believe atomic loads are equivalent to normal loads in Intel processors.
void run1() {
auto a = atomic_var.load(std::memory_order_relaxed);
auto b = atomic_var.load(std::memory_order_relaxed);
// Some code using a and b;
}
void run2() {
if (atomic_var.load(std::memory_order_relaxed) == 2 && /*some conditions*/ ...) {
if (atomic_var.load(std::memory_order_relaxed) * somevar > 3) {
/*...*/
}
}
}
run1()
and run2()
are simply two scenarios using two loads of the same atomic variable. Can the compiler collapse such scenarios of two loads into one load and reuse that?
Can the compiler optimize away atomic loads?
Your implementation of run1()
can be safely optimized to
void run1() {
auto a = atomic_var.load(std::memory_order_relaxed);
auto b = a;
// Some code using a and b;
}
In the original program the two loads could possibly be adjacent to each other in the total order of accesses on atomic_var
every time run1()
is called. In that case the adjacent load()
operations would return the same result.
Since that possibility cannot be excluded, the compiler is allowed to optimize away the second load()
. This can be done for any memory order argument, not just for relaxed atomics.
For run2()
it depends. You didn't specify /*some conditions*/
. If there's something, that might have a visible side effect on the atomic variable (like an opaque function call or accessing a volatile variable, etc.) then this cannot be optimized away. Otherwise it might be possible.
Does the compiler optimize out two atomic loads?
Depends on your compiler. And possibly on the compiler options you passed in. Possibly it depends on your platform. There is some debate going on, on whether compilers should optimize atomics. There is N4455 No Sane Compiler Would Optimize Atomics and this video as a start on the topic.
GCC and clang don't optimize the two load()
operations onto one at the moment.
Neither GCC (6.3) nor Clang (3.9) currently optimizes the two loads into one.
The only way to know is to look at the generated assembly: https://godbolt.org/g/nZ3Ekm