This is quite an odd problem I know, but I'm trying to get a copy of the current largest prime number in a file. Getting the number in integer form is fairly easy. I just run this.
prime = 2**74207281 - 1
It takes about half a second and it works just fine. Operations are fairly quick as well. Dividing it by 10 (without decimals) to shift the digits is quick. However, str(prime)
is taking a very long time. I reimplemented str
like this, and found it was processing about a hundred or so digits per second.
while prime > 0:
strprime += str(prime%10)
prime //= 10
Is there a way to do this more efficiently? I'm doing this in Python. Should I even try this with Python, or is there a better tool for this?
Repeated string concatenation is notoriously inefficient since Python strings are immutable. I would go for
strprime = str(prime)
In my benchmarks, this is consistently the fastest solution. Here's my little benchmark program:
import decimal
def f1(x):
''' Definition by OP '''
strprime = ""
while x > 0:
strprime += str(x%10)
x //= 10
return strprime
def digits(x):
while x > 0:
yield x % 10
x //= 10
def f2(x):
''' Using string.join() to avoid repeated string concatenation '''
return "".join((chr(48 + d) for d in digits(x)))
def f3(x):
''' Plain str() '''
return str(x)
def f4(x):
''' Using Decimal class'''
return decimal.Decimal(x).to_eng_string()
x = 2**100
if __name__ == '__main__':
import timeit
for i in range(1,5):
funcName = "f" + str(i)
print(funcName+ ": " + str(timeit.timeit(funcName + "(x)", setup="from __main__ import " + funcName + ", x")))
For me, this prints (using Python 2.7.10):
f1: 15.3430171013
f2: 20.8928260803
f3: 0.310356140137
f4: 2.80087995529
Python's integer to string conversion algorithm uses a simplistic algorithm with a running of O(n**2). As the length of the number doubles, the conversion time quadruples.
Some simple tests on my computer show the increase in running time:
$ time py35 -c "n=str(2**1000000)"
user 0m1.808s
$ time py35 -c "n=str(2**2000000)"
user 0m7.128s
$ time py35 -c "n=str(2**4000000)"
user 0m28.444s
$ time py35 -c "n=str(2**8000000)"
user 1m54.164s
Since the actual exponent is about 10 times larger than my last test value, it should take about 100 times longer. Or just over 3 hours.
Can it be done faster? Yes. There are several methods that are faster.
Method 1
It is faster to divide the very large number by a power-of-10 into two roughly equal-sized but smaller numbers. The process is repeated until the numbers are relatively small. Then str()
is used on each number and leading zeroes are used to pad the result to the same length as the last power-of-10. Then the strings are joined to form the final result. This method is used by the mpmath
library and the documentation implies it should be about 3x faster.
Method 2
Python's integers are stored in binary format. Binary is great for calculations but binary-to-decimal conversion is the bottleneck. It is possible to define your own integer type that stores the value in blocks of 100 (or some similar value) decimal digits. Operations (exponentiation, multiplication, division) will be slower but conversion to a string will be very fast.
Many years ago, I implemented such a class and used efficient algorithms for multiplication and division. The code is no longer available on the Internet but I did find a backup copy that I tested. The running time was reduced to ~14 seconds.
Update
I updated the DecInt code referenced above and it is now available at https://github.com/casevh/DecInt.
If Python's native integer type is used, the total running time is less than 14 seconds on my computer. If gmpy2
's integer type is used instead, the running time is ~3.5 seconds.
$ py35 DecInt.py
Calculating 2^74207281
Exponentiation time: 3.236
Conversion to decimal format: 0.304
Total elapsed time: 3.540
Length of result: 22338618 digits
Method 3
I maintain the gmpy2 library that provide easy access to the GMP library for fast integer arithmetic. GMP implements Method 1 in highly optimized C and assembly code and calculates the prime number and the string representation in ~5 seconds.
Method 4
The decimal
module in Python stores values as decimal digits. Recent versions of Python 3 include a C implementation of the decimal library that is much faster that the pure-Python implementation include with Python 2. The C implementation run in just over 3 seconds on my computer.
from decimal import *
getcontext().prec = 23000000
getcontext().Emin = -999999999
getcontext().Emax = 999999999
x=Decimal(2)**74207281 - 1
s=str(x)
Took about 32 seconds to output the file using WinGhci (Haskell language):
import System.IO
main = writeFile "prime.txt" (show (2^74207281 - 1))
The file was 21 megabytes; the last four digits, 6351.
There is gmp, the GNU Multiple Precision Arithmetic Library.
It is particularly designed at handling huge numbers fast.