Python recursive generators performance

2020-04-14 07:29发布

In python, when changing a purely recursive function into a recursive generator (not a plain generator) the performance seems to be degrading.

For example, here is a performance comparison between two functions which find all combinations of a list:

from datetime import datetime as dt

def rec_subsets(ms, i=0, s=[]):
    if i == len(ms):
        # do something with s
        return
    rec_subsets(ms, i+1, s)
    rec_subsets(ms, i+1, s + [ms[i]])

def gen_subsets(ms, i=0, s=[]):
    if i == len(ms):
        yield s
        return
    for a in gen_subsets(ms, i+1, s): yield a
    for a in gen_subsets(ms, i+1, s + [ms[i]]): yield a

t1 = dt.now()
rec_subsets(range(20))
t2 = dt.now()
print t2 - t1

t1 = dt.now()
for _ in gen_subsets(range(20)): pass
t2 = dt.now()
print t2 - t1

with the following output:

0:00:01.027000  # rec_subsets
0:00:02.860000  # gen_subsets

One would naturally expect gen_subsets to be approximately as fast as rec_subsets but this is not the case, it is much slower.

Is this normal or am I missing something?

1条回答
叼着烟拽天下
2楼-- · 2020-04-14 07:52

rec_subsets() is still faster (for range(20)) even if result.append(s) is added inplace of # do something with s and the results of both gen_subsets() and rec_subsets() are consumed.

It might be explained by the following quote from PEP 380 (yield from syntax support):

Using a specialised syntax opens up possibilities for optimisation when there is a long chain of generators. Such chains can arise, for instance, when recursively traversing a tree structure. The overhead of passing __next__() calls and yielded values down and up the chain can cause what ought to be an O(n) operation to become, in the worst case, O(n**2).

You could generate a powerset using itertools.combinations():

from itertools import combinations

def subsets_comb(lst):
    return (comb for r in range(len(lst)+1) for comb in combinations(lst, r))

It is faster for range(20) on my machine:

name                    time ratio comment
subsets_comb        227 msec  1.00 [range(0, 20)]
subsets_ipowerset   476 msec  2.10 [range(0, 20)]
subsets_rec         957 msec  4.22 [range(0, 20)]
subsets_gen_pep380 2.34  sec 10.29 [range(0, 20)]
subsets_gen        2.63  sec 11.59 [range(0, 20)]

To reproduce the results, run time-subsets.py.

查看更多
登录 后发表回答