I am looking in to the performance issues of the loop like structures in Python and found the following statements:
Besides the syntactic benefit of list comprehensions, they are often
as fast or faster than equivalent use of map.
(Performance Tips)
List comprehensions run a bit faster than equivalent for-loops (unless
you're just going to throw away the result).
(Python Speed)
I am wondering what difference under the hood gives list comprehension this advantage. Thanks.
Test one: throwing away the result.
Here's our dummy function:
def examplefunc(x):
pass
And here are our challengers:
def listcomp_throwaway():
[examplefunc(i) for i in range(100)]
def forloop_throwaway():
for i in range(100):
examplefunc(i)
I won't do an analysis of its raw speed, only why, per the OP's question. Lets take a look at the diffs of the machine code.
--- List comprehension
+++ For loop
@@ -1,15 +1,16 @@
- 55 0 BUILD_LIST 0
+ 59 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1
12 GET_ITER
- >> 13 FOR_ITER 18 (to 34)
+ >> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
- 19 LOAD_GLOBAL 1 (examplefunc)
+
+ 60 19 LOAD_GLOBAL 1 (examplefunc)
22 LOAD_FAST 0 (i)
25 CALL_FUNCTION 1
- 28 LIST_APPEND 2
- 31 JUMP_ABSOLUTE 13
- >> 34 POP_TOP
- 35 LOAD_CONST 0 (None)
- 38 RETURN_VALUE
+ 28 POP_TOP
+ 29 JUMP_ABSOLUTE 13
+ >> 32 POP_BLOCK
+ >> 33 LOAD_CONST 0 (None)
+ 36 RETURN_VALUE
The race is on. Listcomp's first move is to build an empty list, while for loop's is to setup a loop. Both of them then proceed to load global range(), the constant 100, and call the range function for a generator. Then they both get the current iterator and get the next item, and store it into the variable i. Then they load examplefunc and i and call examplefunc. Listcomp appends it to the list and starts the loop over again. For loop does the same in three instructions instead of two. Then they both load None and return it.
So who seems better in this analysis? Here, list comprehension does some redundant operations such as building the list and appending to it, if you don't care about the result. For loop is pretty efficient too.
If you time them, using a for loop is about one-third faster than a list comprehension. (In this test, examplefunc divided its argument by five and threw it away instead of doing nothing at all.)
Test two: Keeping the result like normal.
No dummy function this test. So here are our challengers:
def listcomp_normal():
l = [i*5 for i in range(100)]
def forloop_normal():
l = []
for i in range(100):
l.append(i*5)
The diff isn't any use to us today. It's just the two machine codes in two blocks.
List comp's machine code:
55 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
19 LOAD_FAST 0 (i)
22 LOAD_CONST 2 (5)
25 BINARY_MULTIPLY
26 LIST_APPEND 2
29 JUMP_ABSOLUTE 13
>> 32 STORE_FAST 1 (l)
35 LOAD_CONST 0 (None)
38 RETURN_VALUE
For loop's machine code:
59 0 BUILD_LIST 0
3 STORE_FAST 0 (l)
60 6 SETUP_LOOP 37 (to 46)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 1 (100)
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 23 (to 45)
22 STORE_FAST 1 (i)
61 25 LOAD_FAST 0 (l)
28 LOAD_ATTR 1 (append)
31 LOAD_FAST 1 (i)
34 LOAD_CONST 2 (5)
37 BINARY_MULTIPLY
38 CALL_FUNCTION 1
41 POP_TOP
42 JUMP_ABSOLUTE 19
>> 45 POP_BLOCK
>> 46 LOAD_CONST 0 (None)
49 RETURN_VALUE
As you can probably already tell, the list comprehension has fewer instructions than for loop does.
List comprehension's checklist:
- Build an anonymous empty list.
- Load
range
.
- Load
100
.
- Call
range
.
- Get the iterator.
- Get the next item on that iterator.
- Store that item onto
i
.
- Load
i
.
- Load the integer five.
- Multiply times five.
- Append the list.
- Repeat steps 6-10 until range is empty.
- Point
l
to the anonymous empty list.
For loop's checklist:
- Build an anonymous empty list.
- Point
l
to the anonymous empty list.
- Setup a loop.
- Load
range
.
- Load
100
.
- Call
range
.
- Get the iterator.
- Get the next item on that iterator.
- Store that item onto
i
.
- Load the list
l
.
- Load the attribute
append
on that list.
- Load
i
.
- Load the integer five.
- Multiply times five.
- Call
append
.
- Go to the top.
- Go to absolute.
(Not including these steps: Load None
, return it.)
The list comprehension doesn't have to do these things:
- Load append of the list every time, since it's pre-bound as a local variable.
- Load
i
twice per loop
- Spend two instructions going to the top
- Directly append to the list instead of calling a wrapper that appens the list
In conclusion, listcomp is a lot faster if you are going to use the values, but if you don't it's pretty slow.
Real speeds
Test one: for loop is faster by about one-third*
Test two: list comprehension is faster by about two-thirds*
*About -> second decimal place acurrate