Generator Comprehension different output from list

2020-01-30 11:59发布

I get different output when using a list comprehension versus a generator comprehension. Is this expected behavior or a bug?

Consider the following setup:

all_configs = [
    {'a': 1, 'b':3},
    {'a': 2, 'b':2}
]
unique_keys = ['a','b']

If I then run the following code, I get:

print(list(zip(*( [c[k] for k in unique_keys] for c in all_configs))))
>>> [(1, 2), (3, 2)]
# note the ( vs [
print(list(zip(*( (c[k] for k in unique_keys) for c in all_configs))))
>>> [(2, 2), (2, 2)]

This is on python 3.6.0:

Python 3.6.0 (default, Dec 24 2016, 08:01:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin

标签: python
4条回答
我想做一个坏孩纸
2楼-- · 2020-01-30 12:37

Both are generator object. The first one is just a generator and the second a generator in a generator

print list( [c[k] for k in unique_keys] for c in all_configs)
[[1, 3], [2, 2]]
print list( (c[k] for k in unique_keys) for c in all_configs)
[<generator object <genexpr> at 0x000000000364A750>, <generator object <genexpr> at 0x000000000364A798>]

When you use zip(* in the first expression nothing happens because it is one generator that will return the list same as list() would do. So it returns the output you would expect. The second time it zips the generators creating a list with the first generator and a list with the second generator. Those generators on there own have a differnt result then the generator of the first expression.

This would be the list compression:

   print [c[k] for k in unique_keys for c in all_configs]
   [1, 2, 3, 2]
查看更多
在下西门庆
3楼-- · 2020-01-30 12:43

To see what's going on, replace c[k] with a function with a side effect:

def f(c,k):
    print(c,k)
    return c[k]
print("listcomp")
print(list(zip(*( [f(c,k) for k in unique_keys] for c in all_configs))))
print("gencomp")
print(list(zip(*( (f(c,k) for k in unique_keys) for c in all_configs))))

output:

listcomp
{'a': 1, 'b': 3} a
{'a': 1, 'b': 3} b
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
[(1, 2), (3, 2)]
gencomp
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} a
{'a': 2, 'b': 2} b
{'a': 2, 'b': 2} b
[(2, 2), (2, 2)]

c in generator expressions is evaluated after the outer loop has completed:

c bears the last value it took in the outer loop.

In the list comprehension case, c is evaluated at once.

(note that aabb vs abab too because of execution when zipping vs execution at once)

note that you can keep the "generator" way of doing it (not creating the temporary list) by passing c to map so the current value is stored:

print(list(zip(*( map(c.get,unique_keys) for c in all_configs))))

in Python 3, map does not create a list, but the result is still OK: [(1, 2), (3, 2)]

查看更多
神经病院院长
4楼-- · 2020-01-30 12:54

This is happening because zip(*) call resulted in evaluation of the outer generator and this outer returned two more generators.

(c[k], print(c)) for k in unique_keys)

The evaluation of outer generator moved c to the second dict: {'a': 2, 'b':2}.

Now when we are evaluating these generators individually they look for c somewhere, and as its value is now {'a': 2, 'b':2} you get the output as [(2, 2), (2, 2)].

Demo:

>>> def my_zip(*args):
...     print(args)
...     for arg in args:
...         print (list(arg))
...
... my_zip(*((c[k] for k in unique_keys) for c in all_configs))
...

Output:

# We have two generators now, means it has looped through `all_configs`.
(<generator object <genexpr>.<genexpr> at 0x104415c50>, <generator object <genexpr>.<genexpr> at 0x10416b1a8>)
[2, 2]
[2, 2]

The list-comprehension on the other hand evaluates right away and can fetch the value of current value of c not its last value.


How to force it use the correct value of c?

Use a inner function and generator function. The inner function can help us remember c's value using default argument.

>>> def solve():
...     for c in all_configs:
...         def func(c=c):
...             return (c[k] for k in unique_keys)
...         yield func()
...

>>>

>>> list(zip(*solve()))
[(1, 2), (3, 2)]
查看更多
The star\"
5楼-- · 2020-01-30 13:00

In a list comprehension, expressions are evaluated eagerly. In a generator expression, they are only looked up as needed.

Thus, as the generator expression iterates over for c in all_configs, it refers to c[k] but only looks up c after the loop is done, so it only uses the latest value for both tuples. By contrast, the list comprehension is evaluated immediately, so it creates a tuple with the first value of c and another tuple with the second value of c.

Consider this small example:

>>> r = range(3)
>>> i = 0
>>> a = [i for _ in r]
>>> b = (i for _ in r)
>>> i = 3
>>> print(*a)
0 0 0
>>> print(*b)
3 3 3

When creating a, the interpreter created that list immediately, looking up the value of i as soon as it was evaluated. When creating b, the interpreter just set up that generator and didn't actually iterate over it and look up the value of i. The print calls told the interpreter to evaluate those objects. a already existed as a full list in memory with the old value of i, but b was evaluated at that point, and when it looked up the value of i, it found the new value.

查看更多
登录 后发表回答