I found this comprehension that works perfectly for flattening a list of lists:
>>> list_of_lists = [(1,2,3),(2,3,4),(3,4,5)]
>>> [item for sublist in list_of_lists for item in sublist]
[1, 2, 3, 2, 3, 4, 3, 4, 5]
I like this better than using itertools.chain()
, but I just can't understand it. I've tried surrounding parts with parentheses, to see if I could reduce the complexity, but now I'm just more confused:
>>> [(item for sublist in list_of_lists) for item in sublist]
[<generator object <genexpr> at 0x7ff919fdfd20>, <generator object <genexpr> at 0x7ff919fdfd70>, <generator object <genexpr> at 0x7ff919fdfdc0>]
>>> [item for sublist in (list_of_lists for item in sublist)]
[5, 5, 5]
I get this feeling that I'm having a hard time understanding because I don't quite understand how generators work... I mean, I thought I did, but now I'm seriously in doubt. Like I said, I love how compact this idiom is, and it's exactly what I need, but I'm loathe to use code that I don't understand.
Can anyone explain what exactly is happening here?
The list comprehension works like this:
In this case,
<what I want>
is everyitem
in everysublist
. To get those items, you just loop over the sublists in the original list, and save/yield each item in the sublist. Thus, the order of the for loops in the list comprehension is the same order you would have used if you did not use a list comprehension. The only confusing part is that the<what I want>
comes first, and not inside the body of the last loop.Read the for loops as if they were nested, from left to right. The expression on the left is the one that produces each value in the final list:
List comprehensions also support
if
tests to filter what elements are used; these can also be seen as nested statements, in the same way as thefor
loops.By adding parenthesis, you changed the expression; everything in parenthesis is now the left-hand expression to add:
A
for
loop like that is a generator expression. It works exactly like a list comprehension except that it doesn't build a list. The elements are instead produced on demand. You can ask a generator expression for the next value, then the next value, etc.In this case, there must be a pre-existing
sublist
object for this to work at all; the outer loop is not overlist_of_lists
anymore, after all.Your last attempt translates to:
Here
list_of_lists
is a loop element in a generator expression looping overfor item in sublist
. Again,sublist
must exist already for this to work. The loop then adds a pre-existingitem
to the final list output.In your case, apparently
sublist
is a list with 3 items in it; your final list produced 3 elements.item
was bound to5
, so you got 3 times5
in your output.List Comprehension
When I first started with list comprehension, I read that like English sentences and I was able to easily understand them. For example,
can be read like
Also, the filtering part can be read as
And the corresponding comprehension would be
Generators
They are like land mines, triggered only when invoked with the
next
protocol. They are similar to functions, but till an exception is raised or the end of function is reached, they are not exhausted and they can be invoked again and again. The important thing is, they retain the state between the previous invocation and the current.The difference between a generator and a function is that, generators use
yield
keyword to give the value back to the invoker. In case of a generator expression, they are similar to the list comprehension, the fist expression is the actual value being "yielded".With this basic understanding, if we look at your expressions in the question,
You are mixing list comprehension with the generator expressions. This will be read like this
which is not what you had in your mind. And since the generator expression is not iterated, the generator expression object is added in the list as it is. Since they will not be evaluated without being invoked with the next protocol, they will not produce any error (if there are any, unless they have syntax error). In this case, it will produce runtime error as
sublist
is not defined yet.Also, in the last case,
The for loop will iterate any iterable with the next protocol. So, the generator expression will be evaluated and the
item
will always be the last element in the iteration of thesublist
and you are adding that in the list. This will also produce runtime error, since sublist is not defined yet.