Chandler Carruth introduced two functions in his CppCon2015 talk that can be used to do some fine-grained inhibition of the optimizer. They are useful to write micro-benchmarks that the optimizer won't simply nuke into meaninglessness.
void clobber() {
asm volatile("" : : : "memory");
}
void escape(void* p) {
asm volatile("" : : "g"(p) : "memory");
}
These use inline assembly statements to change the assumptions of the optimizer.
The assembly statement in clobber
states that the assembly code in it can read and write anywhere in memory. The actual assembly code is empty, but the optimizer won't look into it because it's asm volatile
. It believes it when we tell it the code might read and write everywhere in memory. This effectively prevents the optimizer from reordering or discarding memory writes prior to the call to clobber
, and forces memory reads after the call to clobber
†.
The one in escape
, additionally makes the pointer p
visible to the assembly block. Again, because the optimizer won't look into the actual inline assembly code that code can be empty, and the optimizer will still assume that the block uses the address pointed by the pointer p
. This effectively forces whatever p
points to be in memory and not not in a register, because the assembly block might perform a read from that address.
(This is important because the clobber
function won't force reads nor writes for anything that the compilers decides to put in a register, since the assembly statement in clobber
doesn't state that anything in particular must be visible to the assembly.)
All of this happens without any additional code being generated directly by these "barriers". They are purely compile-time artifacts.
These use language extensions supported in GCC and in Clang, though. Is there a way to have similar behaviour when using MSVC?
† To understand why the optimizer has to think this way, imagine if the assembly block were a loop adding 1 to every byte in memory.
I have used the following in place of
escape
.It's not perfect but it's close enough, I think.
Sadly, I don't have a way to emulate
clobber
.Given your approximation of
escape()
, you should also be fine with the following approximation ofclobber()
(note that this is a draft idea, deferring some of the solution to the implementation of the functionnextLocationToClobber()
):UPDATE
Question: How would you ensure that
isClobberingEnabled
returnsfalse
"in an undeducible way"? Certainly it would be trivial to place the definition in another translation unit, but the minute you enable LTCG, that strategy is defeated. What did you have in mind?Answer: We can take advantage of a hard-to-prove property from the number theory, for example, Fermat's Last Theorem: