I know this subject is well discussed but I've come around a case I don't really understand how the recursive method is "slower" than a method using "reduce,lambda,xrange".
def factorial2(x, rest=1):
if x <= 1:
return rest
else:
return factorial2(x-1, rest*x)
def factorial3(x):
if x <= 1:
return 1
return reduce(lambda a, b: a*b, xrange(1, x+1))
I know python doesn't optimize tail recursion so the question isn't about it. To my understanding, a generator should still generate n amount of numbers using the +1
operator. So technically, fact(n)
should add a number n
times just like the recursive one. The lambda
in the reduce
will be called n
times just as the recursive method... So since we don't have tail call optimization in both case, stacks will be created/destroyed and returned n
times. And a if
in the generator should check when to raise a StopIteration
exception.
This makes me wonder why does the recursive method still slowlier than the other one since the recursive one use simple arithmetic and doesn't use generators.
In a test, I replaced rest*x
by x
in the recursive method and the time spent got reduced on par with the method using reduce
.
Here are my timings for fact(400), 1000 times
factorial3 : 1.22370505333
factorial2 : 1.79896998405
Edit:
Making the method start from 1
to n
doesn't help either instead of n
to 1
. So not an overhead with the -1
.
Also, can we make the recursive method faster. I tried multiple things like global variables that I can change... Using a mutable context by placing variables in cells that I can modify like an array and keep the recursive method without parameters. Sending the method used for recursion as a parameter so we don't have to "dereference" it in our scope...?! But nothings makes it faster.
I'll point out that I have a version of the fact that use a forloop that is much faster than both of those 2 methods so there is clearly space for improvement but I wouldn't expect anything faster than the forloop.
The slowness of the recursive version comes from the need to resolve on each call the named (argument) variables. I have provided a different recursive implementation that has only one argument and it works slightly faster.
Since in your example very large numbers are involved, initially I suspected that the performance difference might be due to the order of multiplication. Multiplying on every iteration a large partial product by the next number is proportional to the number of digits/bits in the product, so the time complexity of such a method is O(n2), where n is the number of bits in the final product. Instead it is better to use a divide and conquer technique, where the final result is obtained as a product of two approximately equally long values each of which is computed recursively in the same manner. So I implemented that version too (see
factorial_divide_and_conquer(n)
in the above code) . As you can see below it still loses to thereduce()
-based version for small arguments (due to the same problem with named parameters) but outperforms it for large arguments.UPDATE
Trying to run the
factorial_recursive?()
versions withx=4000
hits the default recursion limit, so the limit must be increased: