I submitted a pull request with this code:
my_sum = sum([x for x in range(10)])
One of the reviewers suggested this instead:
my_sum = sum(x for x in range(10))
(the difference is just that the square braces are missing).
I was surprised that the second form seems to be identical. But when I tried to use it in other contexts where the first one works, it fails:
y = x for x in range(10)
^ SyntaxError !!!
Are the two forms identical? Is there any important reason for why the square braces aren't necessary in the function? Or is this just something that I have to know?
This is a generator expression. To get it to work in the standalone case, use braces:
y = (x for x in range(10))
and y becomes a generator. You can iterate over generators, so it works where an iterable is expected, such as the sum
function.
Usage examples and pitfalls:
>>> y = (x for x in range(10))
>>> y
<generator object <genexpr> at 0x0000000001E15A20>
>>> sum(y)
45
Be careful when keeping generators around, you can only go through them once. So after the above, if you try to use sum
again, this will happen:
>>> sum(y)
0
So if you pass a generator where actually a list or a set or something similar is expected, you have to be careful. If the function or class stores the argument and tries to iterate over it multiple times, you will run into problems. For example consider this:
def foo(numbers):
s = sum(numbers)
p = reduce(lambda x,y: x*y, numbers, 1)
print "The sum is:", s, "and the product:", p
it will fail if you hand it a generator:
>>> foo(x for x in range(1, 10))
The sum is: 45 and the product: 1
You can easily get a list from the values a generator produces:
>>> y = (x for x in range(10))
>>> list(y)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
You can use this to fix the previous example:
>>> foo(list(x for x in range(1, 10)))
The sum is: 45 and the product: 362880
However keep in mind that if you build a list from a generator, you will need to store every value. This might use a lot more memory in situations where you have lots of items.
Why use a generator in your situation?
The much lower memory consumption is the reason why sum(generator expression)
is better than sum(list)
: The generator version only has to store a single value, while the list-variant has to store N values. Therefore you should always use a generator where you don't risk side-effects.
They are not identical.
The first form,
[x for x in l]
is a list comprehension. The other is a generator expression and written thus:
(x for x in l)
It returns a generator, not a list.
If the generator expression is the only argument in a function call, its parentheses can be skipped.
See PEP 289
First one is list comprehnsion Where second one is generator expression
(x for x in range(10))
<generator object at 0x01C38580>
>>> a = (x for x in range(10))
>>> sum(a)
45
>>>
Use brace for generators:
>>> y = (x for x in range(10))
>>> y
<generator object at 0x01C3D2D8>
>>>
Read this PEP: 289
For instance, the following summation code will build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list:
sum([x*x for x in range(10)])
Memory is conserved by using a generator expression instead:
sum(x*x for x in range(10))
As the data volumes grow larger, generator expressions tend to perform better because they do not exhaust cache memory and they allow Python to re-use objects between iterations.
Use brace product a generator:
>>> y = (x for x in range(10))
>>> y
<generator object <genexpr> at 0x00AC3AA8>