I was reading Preventing compiler optimizations while benchmarking that describes how clobber()
and escape()
from Chandler Carruths talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" affects the compiler.
From reading that, I assumed that if I have an input constraint like "g"(val
), then the compiler wouldn't be able to optimize away val
. But in g()
below, no code is generated. Why?
How can doNotOptimize()
be rewritten to ensure code is generated for g()
?
template <typename T>
void doNotOptimize(T const& val) {
asm volatile("" : : "g"(val) : "memory");
}
void f() {
char x = 1;
doNotOptimize(&x); // x is NOT optimized away
}
void g() {
char x = 1;
doNotOptimize(x); // x is optimized away
}
No code is generated for
g()
because the"g"
constraint allows the input to be optimised to a constant.What, exactly, would it mean to have code generated for g()? If you were writing it yourself, what code would you write? Seriously, this is a real question. You have to decide what output you're expecting before you can start cajoling it from the compiler.
Anyway, let's look at what you have now. In f(),
you are taking the address of
x
, which prevents the optimizer from allocating it in a register. It has to be allocated in memory in order for it to have an address.However, in g(),
x
is just a local variable and any sane optimizer will allocate that in a register, or in this case as a constant. This is allowed, since you never take its address; you just use its value. So, for example, the compiler might generate code like this:Or as in this case not generate any code at all, and substitute any use of the variable by it's constant value.
Your
doNotOptimize
code,uses the
g
constraint for theval
parameter, which says that it can be stored in either a general-purpose register, memory or as a constant, whichever the optimizer finds most convenient. Sinceval
is a constant, when this call is inlined, the optimizer leaves it as a constant. Your "memory" clobber specifier has no effect, because there is no modification of memory going on here.So what can we do? Well, we can force the variable
x
to be allocated in memory, even though it doesn't need to be, by using them
constraint:Now the compiler can't optimize the store of
x
away and is forced to emit the following code:Note that this is basically the same effect that declaring the
x
variablevolatile
would have.Remember the question I asked at the beginning? Is that the output you wanted?
Or, maybe you want the compiler to emit that immediate-to-register move. If so, the
r
constraint will work—or any of the x86-specific constraints that allow you to dictate a particular register. This forces the optimizer to allocate the value in a register, even though it doesn't need to be:I cannot, however, see what the point of either of these would be.
If you wanted to craft a microbenchmark that tested the overhead of calling a function with a single const-reference parameter, then a better option would be to ensure that the definition of the function being called is not visible to the optimizer. Then, it can't inline that function and has to arrange for the call to be made, including all necessary setup. This also works well if you're just studying how a compiler might emit that code. (Naturally, you can't use a template function, though. Well, unless you wanted to abuse C++11's
extern
templates.)I would recommend to declare
But notice that the compiler is "right" to optimize like you observe.