I am comparing the performance of the any()
built-in function with the actual implementation the docs suggest:
I am looking for an element greater than 0 in the following list:
lst = [0 for _ in range(1000000)] + [1]
This is the supposedly equivalent function:
def gt_0(lst):
for elm in lst:
if elm > 0:
return True
return False
And these are the results of the performance tests:
>> %timeit any(elm > 0 for elm in lst)
>> 10 loops, best of 3: 35.9 ms per loop
>> %timeit gt_0(lst)
>> 100 loops, best of 3: 16 ms per loop
I would expect both of the to have the exact same performance, however any()
if two times slower. Why?
The reason is that you've passed a generator expression to the any()
function. Python needs to convert your generator expression to a generator function and that's why it performs slower. Because a generator function needs to call the __next__()
method each time for generating the item and passing it to the any
. This is while in a manual defined function you are passing the whole list to your function which has all the items prepared already.
You can see the difference better by using a list comprehension rather than a generator expression:
In [4]: %timeit any(elm > 0 for elm in lst)
10 loops, best of 3: 66.8 ms per loop
In [6]: test_list = [elm > 0 for elm in lst]
In [7]: %timeit any(test_list)
100 loops, best of 3: 4.93 ms per loop
Also another bottleneck in your code which has more cost than extra calls on next
is the way you do the comparison. As mentioned in comment the better equivalent of your manual function is:
any(True for elm in lst if elm > 0)
In this case you're doing the comparison with the generator expression and it'll perform almost in an equal time as your manual defined function (the slightest difference is because of the generator, I guess.) For a deeper understanding of the underlying reasons read the Ashwini's answer.
Surely a loop over a generator expression is slower compared to a list. But in this case the iteration within the generator is basically a loop over the list itself, hence the next()
calls on generator basically delegate to list's next()
method.
For example in this case there is no 2x performance difference.
>>> lst = list(range(10**5))
>>> %%timeit
... sum(x for x in lst)
...
100 loops, best of 3: 6.39 ms per loop
>>> %%timeit
... c = 0
... for x in lst: c += x
...
100 loops, best of 3: 6.69 ms per loop
First let's check the byte codes of both the approaches:
def gt_0(lst):
for elm in lst:
if elm > 0:
return True
return False
def any_with_ge(lst):
return any(elm > 0 for elm in lst)
Bytecodes:
>>> dis.dis(gt_0)
10 0 SETUP_LOOP 30 (to 33)
3 LOAD_FAST 0 (lst)
6 GET_ITER
>> 7 FOR_ITER 22 (to 32)
10 STORE_FAST 1 (elm)
11 13 LOAD_FAST 1 (elm)
16 LOAD_CONST 1 (0)
19 COMPARE_OP 4 (>)
22 POP_JUMP_IF_FALSE 7
12 25 LOAD_GLOBAL 0 (True)
28 RETURN_VALUE
29 JUMP_ABSOLUTE 7
>> 32 POP_BLOCK
13 >> 33 LOAD_GLOBAL 1 (False)
36 RETURN_VALUE
>>> dis.dis(any_with_ge.func_code.co_consts[1])
17 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 17 (to 23)
6 STORE_FAST 1 (elm)
9 LOAD_FAST 1 (elm)
12 LOAD_CONST 0 (0)
15 COMPARE_OP 4 (>)
18 YIELD_VALUE
19 POP_TOP
20 JUMP_ABSOLUTE 3
>> 23 LOAD_CONST 1 (None)
26 RETURN_VALUE
As you can see there's no jump condition in the any()
version, it basically gets the value of the >
comparison and then again checks its truthy value using PyObject_IsTrue
again. On the other hand the gt_0
checks the truthy value of the condition once and returns True
or False
based on that.
Now let's add another any()
based version that has an if-condition like in the for-loop.
def any_with_ge_and_condition(lst):
return any(True for elm in lst if elm > 0)
Bytecode:
>>> dis.dis(any_with_ge_and_condition.func_code.co_consts[1])
21 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 23 (to 29)
6 STORE_FAST 1 (elm)
9 LOAD_FAST 1 (elm)
12 LOAD_CONST 0 (0)
15 COMPARE_OP 4 (>)
18 POP_JUMP_IF_FALSE 3
21 LOAD_GLOBAL 0 (True)
24 YIELD_VALUE
25 POP_TOP
26 JUMP_ABSOLUTE 3
>> 29 LOAD_CONST 1 (None)
32 RETURN_VALUE
Now we have reduced the work done by any()
by adding the condition(check last section for more details) and it will have to check truthy twice only once when the condition is going to be True
, else it will basically skip to next item.
Now let's compare the timings of these 3:
>>> %timeit gt_0(lst)
10 loops, best of 3: 26.1 ms per loop
>>> %timeit any_with_ge(lst)
10 loops, best of 3: 57.7 ms per loop
>>> %timeit any_with_ge_and_condition(lst)
10 loops, best of 3: 26.8 ms per loop
Let's modify gt_0
to include two checks as in the simple any()
version and check its timing.
from operator import truth
# This calls `PyObject_IsTrue` internally
# https://github.com/python/cpython/blob/master/Modules/_operator.c#L30
def gt_0_truth(lst, truth=truth): # truth=truth to prevent global lookups
for elm in lst:
condition = elm > 0
if truth(condition):
return True
return False
Timing:
>>> %timeit gt_0_truth(lst)
10 loops, best of 3: 56.6 ms per loop
Now, let's see what happens when we try to check truthy value of an item twice using operator.truth
.
>> %%timeit t=truth
... [t(i) for i in xrange(10**5)]
...
100 loops, best of 3: 5.45 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**5)]
...
100 loops, best of 3: 9.06 ms per loop
>>> %%timeit t=truth
[t(i) for i in xrange(10**6)]
...
10 loops, best of 3: 58.8 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**6)]
...
10 loops, best of 3: 87.8 ms per loop
That's quite a difference even though we are simply calling truth()
(i.e PyObject_IsTrue
) on an already boolean object, I guess that sort of explains the slowness of basic any()
version.
You may argue that if
condition in any()
will also result in two truthiness check, but that's not the case when the comparison operation returns Py_True
or Py_False
. POP_JUMP_IF_FALSE
simply jumps to the next OP code and no call to PyObject_IsTrue
is made.
The main chunk of performance boils down to the for
loops.
In your any
, there are two for loops: for elm in lst
and the for loop carried out by any
. So, any iterates over a generator that looks like False, False, False, ..., True
In your gt_0
, there is only one for loop.
If you change it to check if the element is truthy at all, so they both only loop once:
def _any(lst):
for elm in lst:
if elm:
return True
return False
_any(lst)
any(lst)
There is a clear winner:
$ python2 -m timeit "from test import lst, _any" "any(lst)"
100 loops, best of 3: 5.68 msec per loop
$ python2 -m timeit "from test import lst, _any" "_any(lst)"
10 loops, best of 3: 17 msec per loop
print(timeit('any(True for elm in lst if elm > 0)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any([elm > 0 for elm in lst])',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any(elm > 0 for elm in lst)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
produces:
2.1382904349993623
3.1172365920028824
4.580027656000311
As explained by Kasramvd, the last version is slowest because it is using a generator expression; a list comprehension is a bit faster, but - surprisingly - using a generator expression with a conditional clause as proposed by Ashwini Chaudhary is even faster.