Any way faster than pow() to compute an integer po

2019-02-04 03:20发布

站内文章 / C++

28 0

啃猪蹄的小仙女

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I know power of 2 can be implemented using << operator. What about power of 10? Like 10^5? Is there any way faster than pow(10,5) in C++? It is a pretty straight-forward computation by hand. But seems not easy for computers due to binary representation of the numbers... Let us assume I am only interested in integer powers, 10^n, where n is an integer.

回答1:

Something like this:

int quick_pow10(int n)
{
    static int pow10[10] = {
        1, 10, 100, 1000, 10000, 
        100000, 1000000, 10000000, 100000000, 1000000000
    };

    return pow10[n]; 
}

Obviously, can do the same thing for long long.

This should be several times faster than any competing method. However, it is quite limited if you have lots of bases (although the number of values goes down quite dramatically with largeer bases), so if there isn't a huge number of combinations, it's still doable.

As a comparison:

#include <iostream>
#include <cstdlib>
#include <cmath>

static int quick_pow10(int n)
{
    static int pow10[10] = {
        1, 10, 100, 1000, 10000, 
        100000, 1000000, 10000000, 100000000, 1000000000
    };

    return pow10[n]; 
}

static int integer_pow(int x, int n)
{
    int r = 1;
    while (n--)
       r *= x;

    return r; 
}

static int opt_int_pow(int n)
{
    int r = 1;
    const int x = 10;
    while (n)
    {
        if (n & 1) 
        {
           r *= x;
           n--;
        }
        else
        {
            r *= x * x;
            n -= 2;
        }
    }

    return r; 
}


int main(int argc, char **argv)
{
    long long sum = 0;
    int n = strtol(argv[1], 0, 0);
    const long outer_loops = 1000000000;

    if (argv[2][0] == 'a')
    {
        for(long i = 0; i < outer_loops / n; i++)
        {
            for(int j = 1; j < n+1; j++)
            {
                sum += quick_pow10(n);
            }
        }
    }
    if (argv[2][0] == 'b')
    {
        for(long i = 0; i < outer_loops / n; i++)
        {
            for(int j = 1; j < n+1; j++)
            {
                sum += integer_pow(10,n);
            }
        }
    }

    if (argv[2][0] == 'c')
    {
        for(long i = 0; i < outer_loops / n; i++)
        {
            for(int j = 1; j < n+1; j++)
            {
                sum += opt_int_pow(n);
            }
        }
    }

    std::cout << "sum=" << sum << std::endl;
    return 0;
}

Compiled with g++ 4.6.3, using -Wall -O2 -std=c++0x, gives the following results:

$ g++ -Wall -O2 -std=c++0x pow.cpp
$ time ./a.out 8 a
sum=100000000000000000

real    0m0.124s
user    0m0.119s
sys 0m0.004s
$ time ./a.out 8 b
sum=100000000000000000

real    0m7.502s
user    0m7.482s
sys 0m0.003s

$ time ./a.out 8 c
sum=100000000000000000

real    0m6.098s
user    0m6.077s
sys 0m0.002s

(I did have an option for using pow as well, but it took 1m22.56s when I first tried it, so I removed it when I decided to have optimised loop variant)

回答2:

There are certainly ways to compute integral powers of 10 faster than using std::pow()! The first realization is that pow(x, n) can be implemented in O(log n) time. The next realization is that pow(x, 10) is the same as (x << 3) * (x << 1). Of course, the compiler knows the latter, i.e., when you are multiplying an integer by the integer constant 10, the compiler will do whatever is fastest to multiply by 10. Based on these two rules it is easy to create fast computations, even if x is a big integer type.

In case you are interested in games like this:

A generic O(log n) version of power is discussed in Elements of Programming.
Lots of interesting "tricks" with integers are discussed in Hacker's Delight.

回答3:

A solution for any base using template meta-programming :

template<int E, int N>
struct pow {
    enum { value = E * pow<E, N - 1>::value };
};

template <int E>
struct pow<E, 0> {
    enum { value = 1 };
};

Then it can be used to generate a lookup-table that can be used at runtime :

template<int E>
long long quick_pow(unsigned int n) {
    static long long lookupTable[] = {
        pow<E, 0>::value, pow<E, 1>::value, pow<E, 2>::value,
        pow<E, 3>::value, pow<E, 4>::value, pow<E, 5>::value,
        pow<E, 6>::value, pow<E, 7>::value, pow<E, 8>::value,
        pow<E, 9>::value
    };

    return lookupTable[n];
}

This must be used with correct compiler flags in order to detect the possible overflows.

Usage example :

for(unsigned int n = 0; n < 10; ++n) {
    std::cout << quick_pow<10>(n) << std::endl;
}

回答4:

An integer power function (which doesn't involve floating-point conversions and computations) may very well be faster than pow():

int integer_pow(int x, int n)
{
    int r = 1;
    while (n--)
        r *= x;

    return r; 
}

Edit: benchmarked - the naive integer exponentiation method seems to outperform the floating-point one by about a factor of two:

h2co3-macbook:~ h2co3$ cat quirk.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
#include <string.h>
#include <math.h>

int integer_pow(int x, int n)
{
    int r = 1;
    while (n--)
    r *= x;

    return r; 
}

int main(int argc, char *argv[])
{
    int x = 0;

    for (int i = 0; i < 100000000; i++) {
        x += powerfunc(i, 5);
    }

    printf("x = %d\n", x);

    return 0;
}
h2co3-macbook:~ h2co3$ clang -Wall -o quirk quirk.c -Dpowerfunc=integer_pow
h2co3-macbook:~ h2co3$ time ./quirk
x = -1945812992

real    0m1.169s
user    0m1.164s
sys 0m0.003s
h2co3-macbook:~ h2co3$ clang -Wall -o quirk quirk.c -Dpowerfunc=pow
h2co3-macbook:~ h2co3$ time ./quirk
x = -2147483648

real    0m2.898s
user    0m2.891s
sys 0m0.004s
h2co3-macbook:~ h2co3$

回答5:

Here is a stab at it:

// specialize if you have a bignum integer like type you want to work with:
template<typename T> struct is_integer_like:std::is_integral<T> {};
template<typename T> struct make_unsigned_like:std::make_unsigned<T> {};

template<typename T, typename U>
T powT( T base, U exponent ) {
  static_assert( is_integer_like<U>::value, "exponent must be integer-like" );
  static_assert( std::is_same< U, typename make_unsigned_like<U>::type >::value, "exponent must be unsigned" );

  T retval = 1;
  T& multiplicand = base;
  if (exponent) {
    while (true) {
      // branch prediction will be awful here, you may have to micro-optimize:
      retval *= (exponent&1)?multiplicand:1;
      // or /2, whatever -- `>>1` is probably faster, esp for bignums:
      exponent = exponent>>1;
      if (!exponent)
        break;
      multiplicand *= multiplicand;
    }
  }
  return retval;
}

What is going on above is a few things.

First, so BigNum support is cheap, it is templateized. Out of the box, it supports any base type that supports *= own_type and either can be implicitly converted to int, or int can be implicitly converted to it (if both is true, problems will occur), and you need to specialize some templates to indicate that the exponent type involved is both unsigned and integer-like.

In this case, integer-like and unsigned means that it supports &1 returning bool and >>1 returning something it can be constructed from and eventually (after repeated >>1s) reaches a point where evaluating it in a bool context returns false. I used traits classes to express the restriction, because naive use by a value like -1 would compile and (on some platforms) loop forever, while (on others) would not.

Execution time for this algorithm, assuming multiplication is O(1), is O(lg(exponent)), where lg(exponent) is the number of times it takes to <<1 the exponent before it evaluates as false in a boolean context. For traditional integer types, this would be the binary log of the exponents value: so no more than 32.

I also eliminated all branches within the loop (or, made it obvious to existing compilers that no branch is needed, more precisely), with just the control branch (which is true uniformly until it is false once). Possibly eliminating even that branch might be worth it for high bases and low exponents...

回答6:

You can use the lookup table which will be by far the fastest

You can also consider using this:-

template <typename T>
T expt(T p, unsigned q)
{
    T r(1);

    while (q != 0) {
        if (q % 2 == 1) {    // q is odd
            r *= p;
            q--;
        }
        p *= p;
        q /= 2;
    }

    return r;
}

回答7:

No multiplication and no table version:

//Nx10^n
int Npow10(int N, int n){
  N <<= n;
  while(n--) N += N << 2;
  return N;
}

回答8:

This function will calculate x ^ y much faster then pow. In case of integer values.

int pot(int x, int y){
int solution = 1;
while(y){
    if(y&1)
        solution*= x;
    x *= x;
    y >>= 1;
}
return solution;

}

回答9:

Based on Mats Petersson approach, but compile time generation of cache.

#include <iostream>
#include <limits>
#include <array>

// digits

template <typename T>
constexpr T digits(T number) {    
  return number == 0 ? 0 
                     : 1 + digits<T>(number / 10);
}

// pow

// https://stackoverflow.com/questions/24656212/why-does-gcc-complain-error-type-intt-of-template-argument-0-depends-on-a
// unfortunatly we can't write `template <typename T, T N>` because of partial specialization `PowerOfTen<T, 1>`

template <typename T, uintmax_t N>
struct PowerOfTen {
  enum { value = 10 * PowerOfTen<T, N - 1>::value };
};

template <typename T>
struct PowerOfTen<T, 1> {
  enum { value = 1 };
};

// sequence

template<typename T, T...>
struct pow10_sequence { };

template<typename T, T From, T N, T... Is>
struct make_pow10_sequence_from 
: make_pow10_sequence_from<T, From, N - 1, N - 1, Is...> { 
  //  
};

template<typename T, T From, T... Is>
struct make_pow10_sequence_from<T, From, From, Is...> 
: pow10_sequence<T, Is...> { 
  //
};

// base10list

template <typename T, T N, T... Is>
constexpr std::array<T, N> base10list(pow10_sequence<T, Is...>) {
  return {{ PowerOfTen<T, Is>::value... }};
}

template <typename T, T N>
constexpr std::array<T, N> base10list() {    
  return base10list<T, N>(make_pow10_sequence_from<T, 1, N+1>());
}

template <typename T>
constexpr std::array<T, digits(std::numeric_limits<T>::max())> base10list() {    
  return base10list<T, digits(std::numeric_limits<T>::max())>();    
};

// main pow function

template <typename T>
static T template_quick_pow10(T n) {

  static auto values = base10list<T>();
  return values[n]; 
}

// client code

int main(int argc, char **argv) {

  long long sum = 0;
  int n = strtol(argv[1], 0, 0);
  const long outer_loops = 1000000000;

  if (argv[2][0] == 't') {

    for(long i = 0; i < outer_loops / n; i++) {

      for(int j = 1; j < n+1; j++) {

        sum += template_quick_pow10(n);
      }
    }
  }

  std::cout << "sum=" << sum << std::endl;
  return 0;
}

Code does not contain quick_pow10, integer_pow, opt_int_pow for better readability, but tests done with them in the code.

Compiled with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5), using -Wall -O2 -std=c++0x, gives the following results:

$ g++ -Wall -O2 -std=c++0x main.cpp

$ time ./a.out  8 a
sum=100000000000000000

real  0m0.438s
user  0m0.432s
sys 0m0.008s

$ time ./a.out  8 b
sum=100000000000000000

real  0m8.783s
user  0m8.777s
sys 0m0.004s

$ time ./a.out  8 c
sum=100000000000000000

real  0m6.708s
user  0m6.700s
sys 0m0.004s

$ time ./a.out  8 t
sum=100000000000000000

real  0m0.439s
user  0m0.436s
sys 0m0.000s