Beating the dead horse here. A typical (and fast) way of doing integer powers in C is this classic:
int64_t ipow(int64_t base, int exp){
int64_t result = 1;
while(exp){
if(exp & 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
However I needed a compile time integer power so I went ahead and made a recursive implementation using constexpr:
constexpr int64_t ipow_(int base, int exp){
return exp > 1 ? ipow_(base, (exp>>1) + (exp&1)) * ipow_(base, exp>>1) : base;
}
constexpr int64_t ipow(int base, int exp){
return exp < 1 ? 1 : ipow_(base, exp);
}
The second function is only to handle exponents less than 1 in a predictable way. Passing exp<0
is an error in this case.
The recursive version is 4 times slower
I generate a vector of 10E6 random valued bases and exponents in the range [0,15] and time both algorithms on the vector (after doing a non-timed run to try to remove any caching effects). Without optimization the recursice method is twice as fast as the loop. But with -O3 (GCC) the loop is 4 times faster than the recursice method.
My question to you guys is this: Can any one come up with a faster ipow() function that handles exponent and bases of 0 and can be used as a constexpr
?
(Disclaimer: I don't need a faster ipow, I'm just interested to see what the smart people here can come up with).
It seems that this is a standard problem with constexpr and template programming in C++. Due to compile time constraints, the constexpr version is slower than a normal version if executed at runtime. But overloading doesn't allows to chose the correct version. The standardization committee is working on this issue. See for example the following working document http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3583.pdf
A good optimizing compiler will transform tail-recursive functions to run as fast as imperative code. You can transform this function to be tail recursive with pumping. GCC 4.8.1 compiles this test program:
into a loop (See this at gcc.godbolt.org):
vs. your while loop implementation:
Instruction-by-instruction identical is good enough for me.