I saw a comment on Google+ a few weeks ago in which someone demonstrated a straight-forward computation of Fibonacci numbers which was not based on recursion and didn't use memoization. He effectively just remembered the last 2 numbers and kept adding them. This is an O(n) algorithm, but he implemented it very cleanly. So I quickly pointed out that a quicker way is to take advantage of the fact that they can be computed as powers of [[0,1],[1,1]] matrix and it requires only a O(log(N)) computation.
The problem, of course is that this is far from optimal past a certain point. It is efficient as long as the numbers are not too large, but they grow in length at the rate of N*log(phi)/log(10), where N is the Nth Fibonacci number and phi is the golden ratio ( (1+sqrt(5))/2 ~ 1.6 ). As it turns out, log(phi)/log(10) is very close to 1/5. So Nth Fibonacci number can be expected to have roughly N/5 digits.
Matrix multiplication, heck even number multiplication, gets very slow when the numbers start to have millions or billions of digits. So the F(100,000) took about .03 seconds to compute (in Python), while F(1000,000) took roughly 5 seconds. This is hardly O(log(N)) growth. My estimate was that this method, without improvements, only optimizes the computation to be O( (log(N)) ^ (2.5) ) or so.
Computing a billionth Fibonacci number, at this rate, would be prohibitively slow (even though it would only have ~ 1,000,000,000 / 5 digits so it easily fits within 32-bit memory).
Does anyone know of an implementation or algorithm which would allow a faster computation? Perhaps something which would allow calculation of a trillionth Fibonacci number.
And just to be clear, I am not looking for an approximation. I am looking for the exact computation (to the last digit).
Edit 1: I am adding the Python code to show what I believe is O( (log N) ^ 2.5) ) algorithm.
from operator import mul as mul
from time import clock
class TwoByTwoMatrix:
__slots__ = "rows"
def __init__(self, m):
self.rows = m
def __imul__(self, other):
self.rows = [[sum(map(mul, my_row, oth_col)) for oth_col in zip(*other.rows)] for my_row in self.rows]
return self
def intpow(self, i):
i = int(i)
result = TwoByTwoMatrix([[long(1),long(0)],[long(0),long(1)]])
if i <= 0:
return result
k = 0
while i % 2 == 0:
k +=1
i >>= 1
multiplier = TwoByTwoMatrix(self.rows)
while i > 0:
if i & 1:
result *= multiplier
multiplier *= multiplier # square it
i >>= 1
for j in xrange(k):
result *= result
return result
m = TwoByTwoMatrix([[0,1],[1,1]])
t1 = clock()
print len(str(m.intpow(100000).rows[1][1]))
t2 = clock()
print t2 - t1
t1 = clock()
print len(str(m.intpow(1000000).rows[1][1]))
t2 = clock()
print t2 - t1
Edit 2:
It looks like I didn't account for the fact that len(str(...))
would make a significant contribution to the overall runtime of the test. Changing tests to
from math import log as log
t1 = clock()
print log(m.intpow(100000).rows[1][1])/log(10)
t2 = clock()
print t2 - t1
t1 = clock()
print log(m.intpow(1000000).rows[1][1])/log(10)
t2 = clock()
print t2 - t1
shortened the runtimes to .008 seconds and .31 seconds (from .03 seconds and 5 seconds when len(str(...))
were used).
Because M=[[0,1],[1,1]] raised to power N is [[F(N-2), F(N-1)], [F(N-1), F(N)]], the other obvious source of inefficiency was calculating (0,1) and (1,0) elements of the matrix as if they were distinct. This (and I switched to Python3, but Python2.7 times are similar):
class SymTwoByTwoMatrix():
# elments (0,0), (0,1), (1,1) of a symmetric 2x2 matrix are a, b, c.
# b is also the (1,0) element because the matrix is symmetric
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __imul__(self, other):
# this multiplication does work correctly because we
# are multiplying powers of the same symmetric matrix
self.a, self.b, self.c = \
self.a * other.a + self.b * other.b, \
self.a * other.b + self.b * other.c, \
self.b * other.b + self.c * other.c
return self
def intpow(self, i):
i = int(i)
result = SymTwoByTwoMatrix(1, 0, 1)
if i <= 0:
return result
k = 0
while i % 2 == 0:
k +=1
i >>= 1
multiplier = SymTwoByTwoMatrix(self.a, self.b, self.c)
while i > 0:
if i & 1:
result *= multiplier
multiplier *= multiplier # square it
i >>= 1
for j in range(k):
result *= result
return result
calculated F(100,000) in .006, F(1,000,000) in .235 and F(10,000,000) in 9.51 seconds.
Which is to be expected. It is producing results 45% faster for the fastest test and it is expected that the gain should asymptotically approach phi/(1+2*phi+phi*phi) ~ 23.6%.
The (0,0) element of M^N is actually N-2nd Fibonacci number:
for i in range(15):
x = m.intpow(i)
print([x.a,x.b,x.c])
gives
[1, 0, 1]
[0, 1, 1]
[1, 1, 2]
[1, 2, 3]
[2, 3, 5]
[3, 5, 8]
[5, 8, 13]
[8, 13, 21]
[13, 21, 34]
[21, 34, 55]
[34, 55, 89]
[55, 89, 144]
[89, 144, 233]
[144, 233, 377]
[233, 377, 610]
I would expect that not having to calculate element (0,0) would produce a speed up of additional 1/(1+phi+phi*phi) ~ 19%. But the lru_cache
of F(2N) and F(2N-1) solution given by Eli Korvigo below actually gives a speed up of 4 times (ie, 75%). So, while I have not worked out a formal explanation, I am tempted to think that it caches the spans of 1's within the binary expansion of N and does the minimum number of multiplications necessary. Which obviates the need to find those ranges, precompute them and then multiply them at the right point in the expansion of N. lru_cache
allows for a top-to-bottom computation of what would have been a more complicated buttom-to-top computation.
Both SymTwoByTwoMatrix
and the lru_cache-of-F(2N)-and-F(2N-1) are taking roughly 40 times longer to compute every time N grows 10 times. I think that's possibly due to Python's implementation of multiplication of long ints. I think the multiplication of large numbers and their addition should be parallelizable. So a multi-threaded sub-O(N) solution should be possible even though (as Daniel Fisher states in comments) the F(N) solution is Theta(n)
.