Why do tuples in a list comprehension need parenth

2020-04-04 16:48发布

问题:

It is well known that tuples are not defined by parentheses, but commas. Quote from documentation:

A tuple consists of a number of values separated by commas

Therefore:

myVar1 = 'a', 'b', 'c'
type(myVar1)
# Result:
<type 'tuple'>

Another striking example is this:

myVar2 = ('a')
type(myVar2)
# Result:
<type 'str'>  

myVar3 = ('a',)
type(myVar3)
# Result:
<type 'tuple'>

Even the single-element tuple needs a comma, and parentheses are always used just to avoid confusion. My question is: Why can't we omit parentheses of arrays in a list comprehension? For example:

myList1 = ['a', 'b']
myList2 = ['c', 'd']

print([(v1,v2) for v1 in myList1 for v2 in myList2])
# Works, result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]

print([v1,v2 for v1 in myList1 for v2 in myList2])
# Does not work, result:
SyntaxError: invalid syntax

Isn't the second list comprehension just syntactic sugar for the following loop, which does work?

myTuples = []
for v1 in myList1:
    for v2 in myList2:
        myTuple = v1,v2
        myTuples.append(myTuple)
print myTuples
# Result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]

回答1:

Python's grammar is LL(1), meaning that it only looks ahead one symbol when parsing.

[(v1, v2) for v1 in myList1 for v2 in myList2]

Here, the parser sees something like this.

[ # An opening bracket; must be some kind of list
[( # Okay, so a list containing some value in parentheses
[(v1
[(v1,
[(v1, v2
[(v1, v2)
[(v1, v2) for # Alright, list comprehension

However, without the parentheses, it has to make a decision earlier on.

[v1, v2 for v1 in myList1 for v2 in myList2]

[ # List-ish thing
[v1 # List containing a value; alright
[v1, # List containing at least two values
[v1, v2 # Here's the second value
[v1, v2 for # Wait, what?

A parser which backtracks tends to be notoriously slow, so LL(1) parsers do not backtrack. Thus, the ambiguous syntax is forbidden.



回答2:

As I felt "because the grammar forbids it" to be a little too snarky, I came up with a reason.

It begins parsing the expression as a list/set/tuple and is expecting a , and instead encounters a for token.

For example:

$ python3.6 test.py
  File "test.py", line 1
    [a, b for a, b in c]
            ^
SyntaxError: invalid syntax

tokenizes as follows:

$ python3.6 -m tokenize test.py
0,0-0,0:            ENCODING       'utf-8'        
1,0-1,1:            OP             '['            
1,1-1,2:            NAME           'a'            
1,2-1,3:            OP             ','            
1,4-1,5:            NAME           'b'            
1,6-1,9:            NAME           'for'          
1,10-1,11:          NAME           'a'            
1,11-1,12:          OP             ','            
1,13-1,14:          NAME           'b'            
1,15-1,17:          NAME           'in'           
1,18-1,19:          NAME           'c'            
1,19-1,20:          OP             ']'            
1,20-1,21:          NEWLINE        '\n'           
2,0-2,0:            ENDMARKER      ''     


回答3:

There was no parser issue that motivated this restriction. Contrary to Silvio Mayolo's answer, an LL(1) parser could have parsed the no-parentheses syntax just fine. The parentheses were optional in early versions of the original list comprehension patch; they were only made mandatory to make the meaning clearer.

Quoting Guido van Rossum back in 2000, in a response to someone worried that [x, y for ...] would cause parser issues,

Don't worry. Greg Ewing had no problem expressing this in Python's own grammar, which is about as restricted as parsers come. (It's LL(1), which is equivalent to pure recursive descent with one lookahead token, i.e. no backtracking.)

Here's Greg's grammar:

atom: ... | '[' [testlist [list_iter]] ']' | ...
  list_iter: list_for | list_if
  list_for: 'for' exprlist 'in' testlist [list_iter]
  list_if: 'if' test [list_iter]

Note that before, the list syntax was '[' [testlist] ']'. Let me explain it in different terms:

The parser parses a series comma-separated expressions. Previously, it was expecting ']' as the sole possible token following this. After the change, 'for' is another possible following token. This is no problem at all for any parser that knows how to parse matching parentheses!

If you'd rather not support [x, y for ...] because it's ambiguous (to the human reader, not to the parser!), we can change the grammar to something like:

'[' test [',' testlist | list_iter] ']'

(Note that | binds less than concatenation, and [...] means an optional part.)

Also see the next response in the thread, where Greg Ewing runs

>>> seq = [1,2,3,4,5]
>>> [x, x*2 for x in seq]
[(1, 2), (2, 4), (3, 6), (4, 8), (5, 10)]

on an early version of the list comprehension patch, and it works just fine.