In Python 3, is a list comprehension simply syntactic sugar for a generator expression fed into the list
function?
e.g. is the following code:
squares = [x**2 for x in range(1000)]
actually converted in the background into the following?
squares = list(x**2 for x in range(1000))
I know the output is identical, and Python 3 fixes the surprising side-effects to surrounding namespaces that list comprehensions had, but in terms of what the CPython interpreter does under the hood, is the former converted to the latter, or are there any difference in how the code gets executed?
Background
I found this claim of equivalence in the comments section to this question, and a quick google search showed the same claim being made here.
There was also some mention of this in the What's New in Python 3.0 docs, but the wording is somewhat vague:
Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.
Both work differently, the list comprehension version takes the advantage of special bytecode LIST_APPEND
which calls PyList_Append directly for us. Hence it avoids attribute lookup to list.append
and function call at Python level.
>>> def func_lc():
[x**2 for x in y]
...
>>> dis.dis(func_lc)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> lc_object = list(dis.get_instructions(func_lc))[0].argval
>>> lc_object
<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
>>> dis.dis(lc_object)
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE
On the other hand the list()
version simply passes the generator object to list's __init__
method which then calls its extend
method internally. As the object is not a list or tuple CPython then gets its iterator first and then simply adds the items to the list until the iterator is exhausted:
>>> def func_ge():
list(x**2 for x in y)
...
>>> dis.dis(func_ge)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (y)
15 GET_ITER
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> ge_object = list(dis.get_instructions(func_ge))[1].argval
>>> ge_object
<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
>>> dis.dis(ge_object)
2 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_CONST 0 (2)
15 BINARY_POWER
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 1 (None)
24 RETURN_VALUE
>>>
Timing comparisons:
>>> %timeit [x**2 for x in range(10**6)]
1 loops, best of 3: 453 ms per loop
>>> %timeit list(x**2 for x in range(10**6))
1 loops, best of 3: 478 ms per loop
>>> %%timeit
out = []
for x in range(10**6):
out.append(x**2)
...
1 loops, best of 3: 510 ms per loop
Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.
>>> %%timeit
out = [];append=out.append
for x in range(10**6):
append(x**2)
...
1 loops, best of 3: 467 ms per loop
Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:
>>> [x**2 for x in 1, 2, 3] # Python 2
[1, 4, 9]
>>> [x**2 for x in 1, 2, 3] # Python 3
File "<ipython-input-69-bea9540dd1d6>", line 1
[x**2 for x in 1, 2, 3]
^
SyntaxError: invalid syntax
>>> [x**2 for x in (1, 2, 3)] # Add parenthesis
[1, 4, 9]
>>> for x in 1, 2, 3: # Python 3: For normal loops it still works
print(x**2)
...
1
4
9
Both forms create and call an anonymous function. However, the list(...)
form creates a generator function and passes the returned generator-iterator to list
, while with the [...]
form, the anonymous function builds the list directly with LIST_APPEND
opcodes.
The following code gets decompilation output of the anonymous functions for an example comprehension and its corresponding genexp-passed-to-list
:
import dis
def f():
[x for x in []]
def g():
list(x for x in [])
dis.dis(f.__code__.co_consts[1])
dis.dis(g.__code__.co_consts[1])
The output for the comprehension is
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
The output for the genexp is
7 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
You can actually show that the two can have different outcomes to prove they are inherently different:
>>> list(next(iter([])) if x > 3 else x for x in range(10))
[0, 1, 2, 3]
>>> [next(iter([])) if x > 3 else x for x in range(10)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
StopIteration
The expression inside the comprehension is not treated as a generator since the comprehension does not handle the StopIteration
, whereas the list
constructor does.
They aren't the same, list()
will evaluate what ever is given to it after what is in the parentheses has finished executing, not before.
The []
in python is a bit magical, it tells python to wrap what ever is inside it as a list, more like a type hint for the language.