Python: any() unexpected performance

2020-07-17 07:06发布

问题:

I am comparing the performance of the any() built-in function with the actual implementation the docs suggest:

I am looking for an element greater than 0 in the following list:

lst = [0 for _ in range(1000000)] + [1]

This is the supposedly equivalent function:

def gt_0(lst):
    for elm in lst:
        if elm > 0:
            return True
    return False

And these are the results of the performance tests:

>> %timeit any(elm > 0 for elm in lst)
>> 10 loops, best of 3: 35.9 ms per loop

>> %timeit gt_0(lst)
>> 100 loops, best of 3: 16 ms per loop

I would expect both of the to have the exact same performance, however any() if two times slower. Why?

回答1:

The reason is that you've passed a generator expression to the any() function. Python needs to convert your generator expression to a generator function and that's why it performs slower. Because a generator function needs to call the __next__() method each time for generating the item and passing it to the any. This is while in a manual defined function you are passing the whole list to your function which has all the items prepared already.

You can see the difference better by using a list comprehension rather than a generator expression:

In [4]: %timeit any(elm > 0 for elm in lst)
10 loops, best of 3: 66.8 ms per loop

In [6]: test_list = [elm > 0 for elm in lst]

In [7]: %timeit any(test_list)
100 loops, best of 3: 4.93 ms per loop

Also another bottleneck in your code which has more cost than extra calls on next is the way you do the comparison. As mentioned in comment the better equivalent of your manual function is:

any(True for elm in lst if elm > 0)

In this case you're doing the comparison with the generator expression and it'll perform almost in an equal time as your manual defined function (the slightest difference is because of the generator, I guess.) For a deeper understanding of the underlying reasons read the Ashwini's answer.



回答2:

Surely a loop over a generator expression is slower compared to a list. But in this case the iteration within the generator is basically a loop over the list itself, hence the next() calls on generator basically delegate to list's next() method.

For example in this case there is no 2x performance difference.

>>> lst = list(range(10**5))

>>> %%timeit
... sum(x for x in lst)
...
100 loops, best of 3: 6.39 ms per loop

>>> %%timeit
... c = 0
... for x in lst: c += x
...

100 loops, best of 3: 6.69 ms per loop

First let's check the byte codes of both the approaches:

def gt_0(lst):
    for elm in lst:
        if elm > 0:
            return True
    return False


def any_with_ge(lst):
    return any(elm > 0 for elm in lst)

Bytecodes:

>>> dis.dis(gt_0)
 10           0 SETUP_LOOP              30 (to 33)
              3 LOAD_FAST                0 (lst)
              6 GET_ITER
        >>    7 FOR_ITER                22 (to 32)
             10 STORE_FAST               1 (elm)

 11          13 LOAD_FAST                1 (elm)
             16 LOAD_CONST               1 (0)
             19 COMPARE_OP               4 (>)
             22 POP_JUMP_IF_FALSE        7

 12          25 LOAD_GLOBAL              0 (True)
             28 RETURN_VALUE
             29 JUMP_ABSOLUTE            7
        >>   32 POP_BLOCK

 13     >>   33 LOAD_GLOBAL              1 (False)
             36 RETURN_VALUE
>>> dis.dis(any_with_ge.func_code.co_consts[1])
 17           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                17 (to 23)
              6 STORE_FAST               1 (elm)
              9 LOAD_FAST                1 (elm)
             12 LOAD_CONST               0 (0)
             15 COMPARE_OP               4 (>)
             18 YIELD_VALUE
             19 POP_TOP
             20 JUMP_ABSOLUTE            3
        >>   23 LOAD_CONST               1 (None)
             26 RETURN_VALUE

As you can see there's no jump condition in the any() version, it basically gets the value of the > comparison and then again checks its truthy value using PyObject_IsTrue again. On the other hand the gt_0 checks the truthy value of the condition once and returns True or False based on that.

Now let's add another any() based version that has an if-condition like in the for-loop.

def any_with_ge_and_condition(lst):
    return any(True for elm in lst if elm > 0)

Bytecode:

>>> dis.dis(any_with_ge_and_condition.func_code.co_consts[1])
 21           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                23 (to 29)
              6 STORE_FAST               1 (elm)
              9 LOAD_FAST                1 (elm)
             12 LOAD_CONST               0 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE        3
             21 LOAD_GLOBAL              0 (True)
             24 YIELD_VALUE
             25 POP_TOP
             26 JUMP_ABSOLUTE            3
        >>   29 LOAD_CONST               1 (None)
             32 RETURN_VALUE

Now we have reduced the work done by any() by adding the condition(check last section for more details) and it will have to check truthy twice only once when the condition is going to be True, else it will basically skip to next item.


Now let's compare the timings of these 3:

>>> %timeit gt_0(lst)
10 loops, best of 3: 26.1 ms per loop
>>> %timeit any_with_ge(lst)
10 loops, best of 3: 57.7 ms per loop
>>> %timeit any_with_ge_and_condition(lst)
10 loops, best of 3: 26.8 ms per loop

Let's modify gt_0 to include two checks as in the simple any() version and check its timing.

from operator import truth
# This calls `PyObject_IsTrue` internally
# https://github.com/python/cpython/blob/master/Modules/_operator.c#L30


def gt_0_truth(lst, truth=truth): # truth=truth to prevent global lookups
    for elm in lst:
        condition = elm > 0
        if truth(condition):
            return True
    return False

Timing:

>>> %timeit gt_0_truth(lst)
10 loops, best of 3: 56.6 ms per loop

Now, let's see what happens when we try to check truthy value of an item twice using operator.truth.

>> %%timeit t=truth
... [t(i) for i in xrange(10**5)]
...
100 loops, best of 3: 5.45 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**5)]
...
100 loops, best of 3: 9.06 ms per loop
>>> %%timeit t=truth
[t(i) for i in xrange(10**6)]
...
10 loops, best of 3: 58.8 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**6)]
...
10 loops, best of 3: 87.8 ms per loop

That's quite a difference even though we are simply calling truth()(i.e PyObject_IsTrue) on an already boolean object, I guess that sort of explains the slowness of basic any() version.


You may argue that if condition in any() will also result in two truthiness check, but that's not the case when the comparison operation returns Py_True or Py_False. POP_JUMP_IF_FALSE simply jumps to the next OP code and no call to PyObject_IsTrue is made.



回答3:

The main chunk of performance boils down to the for loops.

In your any, there are two for loops: for elm in lst and the for loop carried out by any. So, any iterates over a generator that looks like False, False, False, ..., True

In your gt_0, there is only one for loop.

If you change it to check if the element is truthy at all, so they both only loop once:

def _any(lst):
    for elm in lst:
        if elm:
            return True
    return False

_any(lst)
any(lst)

There is a clear winner:

$ python2 -m timeit "from test import lst, _any" "any(lst)"
100 loops, best of 3: 5.68 msec per loop

$ python2 -m timeit "from test import lst, _any" "_any(lst)"
10 loops, best of 3: 17 msec per loop


回答4:

print(timeit('any(True for elm in lst if elm > 0)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any([elm > 0 for elm in lst])',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any(elm > 0 for elm in lst)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))

produces:

2.1382904349993623
3.1172365920028824
4.580027656000311

As explained by Kasramvd, the last version is slowest because it is using a generator expression; a list comprehension is a bit faster, but - surprisingly - using a generator expression with a conditional clause as proposed by Ashwini Chaudhary is even faster.