I know power of 2 can be implemented using << operator. What about power of 10? Like 10^5? Is there any way faster than pow(10,5) in C++? It is a pretty straight-forward computation by hand. But seems not easy for computers due to binary representation of the numbers... Let us assume I am only interested in integer powers, 10^n, where n is an integer.
问题:
回答1:
Something like this:
int quick_pow10(int n)
{
static int pow10[10] = {
1, 10, 100, 1000, 10000,
100000, 1000000, 10000000, 100000000, 1000000000
};
return pow10[n];
}
Obviously, can do the same thing for long long
.
This should be several times faster than any competing method. However, it is quite limited if you have lots of bases (although the number of values goes down quite dramatically with largeer bases), so if there isn't a huge number of combinations, it's still doable.
As a comparison:
#include <iostream>
#include <cstdlib>
#include <cmath>
static int quick_pow10(int n)
{
static int pow10[10] = {
1, 10, 100, 1000, 10000,
100000, 1000000, 10000000, 100000000, 1000000000
};
return pow10[n];
}
static int integer_pow(int x, int n)
{
int r = 1;
while (n--)
r *= x;
return r;
}
static int opt_int_pow(int n)
{
int r = 1;
const int x = 10;
while (n)
{
if (n & 1)
{
r *= x;
n--;
}
else
{
r *= x * x;
n -= 2;
}
}
return r;
}
int main(int argc, char **argv)
{
long long sum = 0;
int n = strtol(argv[1], 0, 0);
const long outer_loops = 1000000000;
if (argv[2][0] == 'a')
{
for(long i = 0; i < outer_loops / n; i++)
{
for(int j = 1; j < n+1; j++)
{
sum += quick_pow10(n);
}
}
}
if (argv[2][0] == 'b')
{
for(long i = 0; i < outer_loops / n; i++)
{
for(int j = 1; j < n+1; j++)
{
sum += integer_pow(10,n);
}
}
}
if (argv[2][0] == 'c')
{
for(long i = 0; i < outer_loops / n; i++)
{
for(int j = 1; j < n+1; j++)
{
sum += opt_int_pow(n);
}
}
}
std::cout << "sum=" << sum << std::endl;
return 0;
}
Compiled with g++ 4.6.3, using -Wall -O2 -std=c++0x
, gives the following results:
$ g++ -Wall -O2 -std=c++0x pow.cpp
$ time ./a.out 8 a
sum=100000000000000000
real 0m0.124s
user 0m0.119s
sys 0m0.004s
$ time ./a.out 8 b
sum=100000000000000000
real 0m7.502s
user 0m7.482s
sys 0m0.003s
$ time ./a.out 8 c
sum=100000000000000000
real 0m6.098s
user 0m6.077s
sys 0m0.002s
(I did have an option for using pow
as well, but it took 1m22.56s when I first tried it, so I removed it when I decided to have optimised loop variant)
回答2:
There are certainly ways to compute integral powers of 10 faster than using std::pow()
! The first realization is that pow(x, n)
can be implemented in O(log n) time. The next realization is that pow(x, 10)
is the same as (x << 3) * (x << 1)
. Of course, the compiler knows the latter, i.e., when you are multiplying an integer by the integer constant 10, the compiler will do whatever is fastest to multiply by 10. Based on these two rules it is easy to create fast computations, even if x
is a big integer type.
In case you are interested in games like this:
- A generic O(log n) version of power is discussed in Elements of Programming.
- Lots of interesting "tricks" with integers are discussed in Hacker's Delight.
回答3:
A solution for any base using template meta-programming :
template<int E, int N>
struct pow {
enum { value = E * pow<E, N - 1>::value };
};
template <int E>
struct pow<E, 0> {
enum { value = 1 };
};
Then it can be used to generate a lookup-table that can be used at runtime :
template<int E>
long long quick_pow(unsigned int n) {
static long long lookupTable[] = {
pow<E, 0>::value, pow<E, 1>::value, pow<E, 2>::value,
pow<E, 3>::value, pow<E, 4>::value, pow<E, 5>::value,
pow<E, 6>::value, pow<E, 7>::value, pow<E, 8>::value,
pow<E, 9>::value
};
return lookupTable[n];
}
This must be used with correct compiler flags in order to detect the possible overflows.
Usage example :
for(unsigned int n = 0; n < 10; ++n) {
std::cout << quick_pow<10>(n) << std::endl;
}
回答4:
An integer power function (which doesn't involve floating-point conversions and computations) may very well be faster than pow()
:
int integer_pow(int x, int n)
{
int r = 1;
while (n--)
r *= x;
return r;
}
Edit: benchmarked - the naive integer exponentiation method seems to outperform the floating-point one by about a factor of two:
h2co3-macbook:~ h2co3$ cat quirk.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
#include <string.h>
#include <math.h>
int integer_pow(int x, int n)
{
int r = 1;
while (n--)
r *= x;
return r;
}
int main(int argc, char *argv[])
{
int x = 0;
for (int i = 0; i < 100000000; i++) {
x += powerfunc(i, 5);
}
printf("x = %d\n", x);
return 0;
}
h2co3-macbook:~ h2co3$ clang -Wall -o quirk quirk.c -Dpowerfunc=integer_pow
h2co3-macbook:~ h2co3$ time ./quirk
x = -1945812992
real 0m1.169s
user 0m1.164s
sys 0m0.003s
h2co3-macbook:~ h2co3$ clang -Wall -o quirk quirk.c -Dpowerfunc=pow
h2co3-macbook:~ h2co3$ time ./quirk
x = -2147483648
real 0m2.898s
user 0m2.891s
sys 0m0.004s
h2co3-macbook:~ h2co3$
回答5:
Here is a stab at it:
// specialize if you have a bignum integer like type you want to work with:
template<typename T> struct is_integer_like:std::is_integral<T> {};
template<typename T> struct make_unsigned_like:std::make_unsigned<T> {};
template<typename T, typename U>
T powT( T base, U exponent ) {
static_assert( is_integer_like<U>::value, "exponent must be integer-like" );
static_assert( std::is_same< U, typename make_unsigned_like<U>::type >::value, "exponent must be unsigned" );
T retval = 1;
T& multiplicand = base;
if (exponent) {
while (true) {
// branch prediction will be awful here, you may have to micro-optimize:
retval *= (exponent&1)?multiplicand:1;
// or /2, whatever -- `>>1` is probably faster, esp for bignums:
exponent = exponent>>1;
if (!exponent)
break;
multiplicand *= multiplicand;
}
}
return retval;
}
What is going on above is a few things.
First, so BigNum support is cheap, it is template
ized. Out of the box, it supports any base type that supports *= own_type
and either can be implicitly converted to int
, or int
can be implicitly converted to it (if both is true, problems will occur), and you need to specialize some template
s to indicate that the exponent type involved is both unsigned and integer-like.
In this case, integer-like and unsigned means that it supports &1
returning bool
and >>1
returning something it can be constructed from and eventually (after repeated >>1
s) reaches a point where evaluating it in a bool
context returns false
. I used traits classes to express the restriction, because naive use by a value like -1
would compile and (on some platforms) loop forever, while (on others) would not.
Execution time for this algorithm, assuming multiplication is O(1), is O(lg(exponent)), where lg(exponent) is the number of times it takes to <<1
the exponent
before it evaluates as false
in a bool
ean context. For traditional integer types, this would be the binary log of the exponent
s value: so no more than 32.
I also eliminated all branches within the loop (or, made it obvious to existing compilers that no branch is needed, more precisely), with just the control branch (which is true uniformly until it is false once). Possibly eliminating even that branch might be worth it for high bases and low exponents...
回答6:
You can use the lookup table which will be by far the fastest
You can also consider using this:-
template <typename T>
T expt(T p, unsigned q)
{
T r(1);
while (q != 0) {
if (q % 2 == 1) { // q is odd
r *= p;
q--;
}
p *= p;
q /= 2;
}
return r;
}
回答7:
No multiplication and no table version:
//Nx10^n
int Npow10(int N, int n){
N <<= n;
while(n--) N += N << 2;
return N;
}
回答8:
This function will calculate x ^ y much faster then pow. In case of integer values.
int pot(int x, int y){
int solution = 1;
while(y){
if(y&1)
solution*= x;
x *= x;
y >>= 1;
}
return solution;
}
回答9:
Based on Mats Petersson approach, but compile time generation of cache.
#include <iostream>
#include <limits>
#include <array>
// digits
template <typename T>
constexpr T digits(T number) {
return number == 0 ? 0
: 1 + digits<T>(number / 10);
}
// pow
// https://stackoverflow.com/questions/24656212/why-does-gcc-complain-error-type-intt-of-template-argument-0-depends-on-a
// unfortunatly we can't write `template <typename T, T N>` because of partial specialization `PowerOfTen<T, 1>`
template <typename T, uintmax_t N>
struct PowerOfTen {
enum { value = 10 * PowerOfTen<T, N - 1>::value };
};
template <typename T>
struct PowerOfTen<T, 1> {
enum { value = 1 };
};
// sequence
template<typename T, T...>
struct pow10_sequence { };
template<typename T, T From, T N, T... Is>
struct make_pow10_sequence_from
: make_pow10_sequence_from<T, From, N - 1, N - 1, Is...> {
//
};
template<typename T, T From, T... Is>
struct make_pow10_sequence_from<T, From, From, Is...>
: pow10_sequence<T, Is...> {
//
};
// base10list
template <typename T, T N, T... Is>
constexpr std::array<T, N> base10list(pow10_sequence<T, Is...>) {
return {{ PowerOfTen<T, Is>::value... }};
}
template <typename T, T N>
constexpr std::array<T, N> base10list() {
return base10list<T, N>(make_pow10_sequence_from<T, 1, N+1>());
}
template <typename T>
constexpr std::array<T, digits(std::numeric_limits<T>::max())> base10list() {
return base10list<T, digits(std::numeric_limits<T>::max())>();
};
// main pow function
template <typename T>
static T template_quick_pow10(T n) {
static auto values = base10list<T>();
return values[n];
}
// client code
int main(int argc, char **argv) {
long long sum = 0;
int n = strtol(argv[1], 0, 0);
const long outer_loops = 1000000000;
if (argv[2][0] == 't') {
for(long i = 0; i < outer_loops / n; i++) {
for(int j = 1; j < n+1; j++) {
sum += template_quick_pow10(n);
}
}
}
std::cout << "sum=" << sum << std::endl;
return 0;
}
Code does not contain quick_pow10, integer_pow, opt_int_pow for better readability, but tests done with them in the code.
Compiled with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5), using -Wall -O2 -std=c++0x, gives the following results:
$ g++ -Wall -O2 -std=c++0x main.cpp
$ time ./a.out 8 a
sum=100000000000000000
real 0m0.438s
user 0m0.432s
sys 0m0.008s
$ time ./a.out 8 b
sum=100000000000000000
real 0m8.783s
user 0m8.777s
sys 0m0.004s
$ time ./a.out 8 c
sum=100000000000000000
real 0m6.708s
user 0m6.700s
sys 0m0.004s
$ time ./a.out 8 t
sum=100000000000000000
real 0m0.439s
user 0m0.436s
sys 0m0.000s