Preventing compiler optimizations while benchmarki

2019-03-27 08:25发布

问题:

I recently came across this brilliant cpp2015 talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"

One of the techniques mentioned to prevent the compiler from optimizing code is using the below functions.

static void escape(void *p) {
  asm volatile("" : : "g"(p) : "memory");
}

static void clobber() {
  asm volatile("" : : : "memory");
}

void benchmark()
{
  vector<int> v;
  v.reserve(1);
  escape(v.data());
  v.push_back(10);
  clobber()
}

I'm trying to understand this. Questions as follows.

1) What is the advantage of an escape over clobber ?

2) From the example above it looks like clobber() prevents the previous statement ( push_back ) to be optimized way. If that's the case why the below snippet is not correct ?

 void benchmark()
 {
     vector<int> v;
     v.reserve(1);
     v.push_back(10);
     clobber()
 }

If this wasn't confusing enough, folly ( FB's threading lib ) has an even stranger implementation

Relevant snippet:

template <class T>
void doNotOptimizeAway(T&& datum) {
  asm volatile("" : "+r" (datum));
}

My understanding is that the above snippet informs the compiler that the assembly block will writes to datum. But if the compiler finds there is no consumer of this datum it can still optimize out the entity producing datum right ?

I assume this is not common knowledge and any help is appreciated !

回答1:

tl;dr doNotOptimizeAway creates an artificial "use"s.

A little bit of terminology here: a "def" ("definition") is a statement, which assigns a value to a variable; a "use" is a statement, which uses the value of a variable to perform some operation.

If from the point immediately after a def, all the paths to the program exit do not encounter a use of a variable, that def is called dead and Dead Code Elimination (DCE) pass will remove it. Which in turn may cause other defs to become dead (if that def was an use by virtue of having variable operands), etc.

Imagine the program after Scalar Replacement of Aggregates (SRA) pass, which turns the local std::vector in two variables len and ptr. At some point the program assigns a value to ptr; that statement is a def.

Now, the original program didn't do anything with the vector; in other words there weren't any uses of either len or ptr. Hence, all of their defs are dead and the DCE can remove them, effectively removing all code and making the benchmark worthless.

Adding doNotOptimizeAway(ptr) creates an artificial use, which prevents DCE from removing the defs. (As a side note, I see no point in the "+", "g" should have been enough).

A similar line of reasoning can be followed with memory loads and stores: a store (a def) is dead iff there is no path to the end of the program, which contains load (a use) from that store location. As tracking arbitrary memory locations is a lot harder than tracking individual pseudo-register variables, the compiler reasons conservatively - a store is dead if there is no path to the end of the program, which could possibly encounter a use of that store.

One such case, is a store to a region of memory, which is guaranteed to not be aliased - after that memory is deallocated, there could not possibly be a use of that store, which does not trigger undefined behaviour. IOW, there are no such uses.

Thus a compiler could eliminate v.push_back(42). But there comes escape - it causes the v.data() to be considered as arbitrarily aliased, as @Leon described above.

The purpose of clobber() in the example is to create an artificial use of all of the aliased memory. We have a store (from push_back(42)), the store is to a location that is globally aliased (due to the escape(v.data())), hence clobber() could potentially contain a use of that store (IOW, the store side effect to be observable), therefore the compiler is not allowed to remove the store.

A few simpler examples:

Example I:

void f() {
  int v[1];
  v[0] = 42;
}

This does not generate any code.

Example II:

extern void g();

void f() {
  int v[1];
  v[0] = 42;
  g();
}

This generates just a call to g(), no memory store. The function g cannot possibly access v because v is not aliased.

Example III:

void clobber() {
  __asm__ __volatile__ ("" : : : "memory");
}

void f() {
  int v[1];
  v[0] = 42;
  clobber();
}

Like in the previous example, no store generated because v is not aliased and the call to clobber is inlined to nothing.

Example IV:

template<typename T>
void use(T &&t) {
  __asm__ __volatile__ ("" :: "g" (t));
}

void f() {
  int v[1];
  use(v);
  v[0] = 42;
}

This time v escapes (i.e. can be potentially accessed from other activation frames). However, the store is still removed, since after it there were no potential uses of that memory (without UB).

Example V:

template<typename T>
void use(T &&t) {
  __asm__ __volatile__ ("" :: "g" (t));
}

extern void g();

void f() {
  int v[1];
  use(v);
  v[0] = 42;
  g(); // same with clobber()
}

And finally we get the store, because v escapes and the compiler must conservatively assume that the call to g may access the stored value.

(for experiments https://godbolt.org/g/rFviMI)



回答2:

1) What is the advantage of an escape over clobber ?

escape() doesn't have an advantage over clobber(). escape() complements clobber() in the following important way:

The effect of clobber() is limited to memory that is potentially accessible through an imaginary global root pointer. In other words, compiler's model of the allocated memory is a connected graph of blocks referring to each other through pointers, and the said imaginary global root pointer serves as an entry point to that graph. (Memory leaks are not accounted for in this model, i.e. the compiler ignores the possibility that once accessible blocks may become inaccessible because of a lost pointer value). A newly allocated block is not a part of a such graph, and is immune to any side-effects of clobber(). escape() ensures that the passed in address belongs to the globally accessible set of memory blocks. When applied to a newly allocated memory block, escape() has the effect of adding it to the said graph.

2) From the example above it looks like clobber() prevents the previous statement ( push_back ) to be optimized way. If that's the case why the below snippet is not correct ?

 void benchmark()
 {
     vector<int> v;
     v.reserve(1);
     v.push_back(10);
     clobber();
 }

The allocation hidden inside v.reserve(1) is not visible to clobber() until it is registered via escape().