可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

g++ (4.7.2) and similar versions seem to evaluate constexpr surprisingly fast during compile-time. On my machines in fact much faster than the compiled program during runtime.

Is there a reasonable explanation for that behavior? Are there optimization techniques involved which are only applicable at compile-time, that can be executed quicker than actual compiled code? If so, which?

Here`s my test program and the observed results.

#include <iostream>

constexpr int mc91(int n)
 {

     return (n > 100)? n-10 : mc91(mc91(n+11));

 }

constexpr double foo(double n)
{
   return (n>2)? (0.9999)*((unsigned int)(foo(n-1)+foo(n-2))%100):1;
}

constexpr unsigned ack( unsigned m, unsigned n )
{
    return m == 0
        ? n + 1
        : n == 0
        ? ack( m - 1, 1 )
        : ack( m - 1, ack( m, n - 1 ) );
}

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

int main(void)
{
   constexpr unsigned int compiletime_ack=ack(3,14);
   constexpr int compiletime_91=slow91(49);
   static_assert( compiletime_ack == 131069, "Must be evaluated at compile-time" );
   static_assert( compiletime_91  == 91,     "Must be evaluated at compile-time" );
   std::cout << compiletime_ack << std::endl;
   std::cout << compiletime_91  << std::endl;
   std::cout << ack(3,14) << std::endl;
   std::cout << slow91(49) << std::endl;
   return 0;
}

compiletime:

time g++ constexpr.cpp -std=c++11 -fconstexpr-depth=10000000 -O3 

real    0m0.645s
user    0m0.600s
sys     0m0.032s

runtime:

time ./a.out 

131069
91
131069
91

real    0m43.708s
user    0m43.567s
sys     0m0.008s

Here mc91 is the usual mac carthy f91 (as can be found on wikipedia) and foo is just a useless function returning real values between about 1 and 100, with a fib runtime complexity.

Both the slow calculation of 91 and the ackermann functions get evaluated with the same arguments by the compiler and the compiled program.

Surprisingly the program would even run faster, just generating code and running it through the compiler than executing the code itself.

回答1:

At compile-time, redundant (identical) constexpr calls can be memoized, while run-time recursive behavior does not provide this.

If you change every recursive function such as...

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

... to a form that isn't constexpr, but does remember past calculations at runtime:

std::unordered_map< int, boost::optional<unsigned> > results4;
//     parameter(s) ^^^           result ^^^^^^^^

unsigned slow91(int n) {
     boost::optional<unsigned> &ret = results4[n];
     if ( !ret )
     {
         ret = mc91(mc91(foo(n))%100);
     }
     return *ret;
}

You will get less surprising results.

compiletime:

time g++ test.cpp -std=c++11 -O3

real    0m1.708s
user    0m1.496s
sys     0m0.176s

runtime:

time ./a.out

131069
91
131069
91

real    0m0.097s
user    0m0.064s
sys     0m0.032s

回答2:

Memoization

This is a very interesting "discovery" but the answer is probably more simple than you think it is.

Something can be evaluated compile-time when declared constexpr if all values involved are known at compile time (and if the variable where the value is supposed to end up is declared constexpr as well) with that said imagine the following pseudo-code:

f(x)   = g(x)
g(x)   = x + h(x,x)
h(x,y) = x + y

since every value is known at compile time the compiler can rewrite the above into the, equivalent, below:

f(x) = x + x + x

To put it in words every function call has been removed and replaced with that of the expression itself. What is also applicable is a method called memoization where results of passed calculated expresions are stored away so you only need to do the hard work once.

If you know that g(5) = 15 why calculate it again? instead just replace g(5) with 15 everytime it is needed, This is possible since a function declared as constexpr isn't allowed to have side-effects .

Runtime

In runtime this is not happening (since we didn't tell the code to behave this way). The little guy running through your code will need to jump from f to g to h and then jump back to g from h before it jumps from g to f all while he stores the return value of each function and passing it along to the next one.

Even if this guy is very very tiny and that he doesn't need to jump very very far he still doesn't like jumping back and forth all the time, it takes a lot for him to do this and with that; it takes time.

But in the OPs example, is it really calculated compile-time?

Yes, and to those not believing that the compiler actually calculates this and put it as constants in the finished binary I will supply the relevant assembly instructions from OPs code below (output of g++ -S -Wall -pedantic -fconstexpr-depth=1000000 -std=c++11)

main:
.LFB1200:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register 6
  subq  $16, %rsp
  movl  $131069, -4(%rbp)
  movl  $91, -8(%rbp)
  movl  $131069, %esi               # one of the values from constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEj
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi
  call  _ZNSolsEPFRSoS_E
  movl  $91, %esi                   # the other value from our constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEi
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi

  # ...
  # a lot of jumping is taking place down here
  # see the full output at http://codepad.org/Q8D7c41y