Are list comprehensions syntactic sugar for `list(

2019-01-15 11:45发布

问题:

In Python 3, is a list comprehension simply syntactic sugar for a generator expression fed into the list function?

e.g. is the following code:

squares = [x**2 for x in range(1000)]

actually converted in the background into the following?

squares = list(x**2 for x in range(1000))

I know the output is identical, and Python 3 fixes the surprising side-effects to surrounding namespaces that list comprehensions had, but in terms of what the CPython interpreter does under the hood, is the former converted to the latter, or are there any difference in how the code gets executed?

Background

I found this claim of equivalence in the comments section to this question, and a quick google search showed the same claim being made here.

There was also some mention of this in the What's New in Python 3.0 docs, but the wording is somewhat vague:

Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.

回答1:

Both work differently, the list comprehension version takes the advantage of special bytecode LIST_APPEND which calls PyList_Append directly for us. Hence it avoids attribute lookup to list.append and function call at Python level.

>>> def func_lc():
    [x**2 for x in y]
...
>>> dis.dis(func_lc)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
              3 LOAD_CONST               2 ('func_lc.<locals>.<listcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              0 (y)
             12 GET_ITER
             13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             16 POP_TOP
             17 LOAD_CONST               0 (None)
             20 RETURN_VALUE

>>> lc_object = list(dis.get_instructions(func_lc))[0].argval
>>> lc_object
<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
>>> dis.dis(lc_object)
  2           0 BUILD_LIST               0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                16 (to 25)
              9 STORE_FAST               1 (x)
             12 LOAD_FAST                1 (x)
             15 LOAD_CONST               0 (2)
             18 BINARY_POWER
             19 LIST_APPEND              2
             22 JUMP_ABSOLUTE            6
        >>   25 RETURN_VALUE

On the other hand the list() version simply passes the generator object to list's __init__ method which then calls its extend method internally. As the object is not a list or tuple CPython then gets its iterator first and then simply adds the items to the list until the iterator is exhausted:

>>> def func_ge():
    list(x**2 for x in y)
...
>>> dis.dis(func_ge)
  2           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
              6 LOAD_CONST               2 ('func_ge.<locals>.<genexpr>')
              9 MAKE_FUNCTION            0
             12 LOAD_GLOBAL              1 (y)
             15 GET_ITER
             16 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 POP_TOP
             23 LOAD_CONST               0 (None)
             26 RETURN_VALUE
>>> ge_object = list(dis.get_instructions(func_ge))[1].argval
>>> ge_object
<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
>>> dis.dis(ge_object)
  2           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                15 (to 21)
              6 STORE_FAST               1 (x)
              9 LOAD_FAST                1 (x)
             12 LOAD_CONST               0 (2)
             15 BINARY_POWER
             16 YIELD_VALUE
             17 POP_TOP
             18 JUMP_ABSOLUTE            3
        >>   21 LOAD_CONST               1 (None)
             24 RETURN_VALUE
>>>

Timing comparisons:

>>> %timeit [x**2 for x in range(10**6)]
1 loops, best of 3: 453 ms per loop
>>> %timeit list(x**2 for x in range(10**6))
1 loops, best of 3: 478 ms per loop
>>> %%timeit
out = []
for x in range(10**6):
    out.append(x**2)
...
1 loops, best of 3: 510 ms per loop

Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.

>>> %%timeit
out = [];append=out.append
for x in range(10**6):
    append(x**2)
...
1 loops, best of 3: 467 ms per loop

Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:

>>> [x**2 for x in 1, 2, 3] # Python 2
[1, 4, 9]
>>> [x**2 for x in 1, 2, 3] # Python 3
  File "<ipython-input-69-bea9540dd1d6>", line 1
    [x**2 for x in 1, 2, 3]
                    ^
SyntaxError: invalid syntax

>>> [x**2 for x in (1, 2, 3)] # Add parenthesis
[1, 4, 9]
>>> for x in 1, 2, 3: # Python 3: For normal loops it still works
    print(x**2)
...
1
4
9


回答2:

Both forms create and call an anonymous function. However, the list(...) form creates a generator function and passes the returned generator-iterator to list, while with the [...] form, the anonymous function builds the list directly with LIST_APPEND opcodes.

The following code gets decompilation output of the anonymous functions for an example comprehension and its corresponding genexp-passed-to-list:

import dis

def f():
    [x for x in []]

def g():
    list(x for x in [])

dis.dis(f.__code__.co_consts[1])
dis.dis(g.__code__.co_consts[1])

The output for the comprehension is

  4           0 BUILD_LIST               0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                12 (to 21)
              9 STORE_FAST               1 (x)
             12 LOAD_FAST                1 (x)
             15 LIST_APPEND              2
             18 JUMP_ABSOLUTE            6
        >>   21 RETURN_VALUE

The output for the genexp is

  7           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                11 (to 17)
              6 STORE_FAST               1 (x)
              9 LOAD_FAST                1 (x)
             12 YIELD_VALUE
             13 POP_TOP
             14 JUMP_ABSOLUTE            3
        >>   17 LOAD_CONST               0 (None)
             20 RETURN_VALUE


回答3:

You can actually show that the two can have different outcomes to prove they are inherently different:

>>> list(next(iter([])) if x > 3 else x for x in range(10))
[0, 1, 2, 3]

>>> [next(iter([])) if x > 3 else x for x in range(10)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
StopIteration

The expression inside the comprehension is not treated as a generator since the comprehension does not handle the StopIteration, whereas the list constructor does.



回答4:

They aren't the same, list() will evaluate what ever is given to it after what is in the parentheses has finished executing, not before.

The [] in python is a bit magical, it tells python to wrap what ever is inside it as a list, more like a type hint for the language.