In Python, is there any difference between creating a generator object through a generator expression versus using the yield statement?
Using yield:
def Generator(x, y):
for i in xrange(x):
for j in xrange(y):
yield(i, j)
Using generator expression:
def Generator(x, y):
return ((i, j) for i in xrange(x) for j in xrange(y))
Both functions return generator objects, which produce tuples, e.g. (0,0), (0,1) etc.
Any advantages of one or the other? Thoughts?
Thanks everybody! There is a lot of great information and further references in these answers!
When thinking about iterators, the
itertools
module:For performance, consider
itertools.product(*iterables[, repeat])
Using
yield
is nice if the expression is more complicated than just nested loops. Among other things you can return a special first or special last value. Consider:There is a difference that could be important in some contexts that hasn't been pointed out yet. Using
yield
prevents you from usingreturn
for something else than implicitly raising StopIteration (and coroutines related stuff).This means this code is ill-formed (and feeding it to an interpreter will give you an
AttributeError
):On the other hand, this code works like a charm:
Yes there is a difference.
For the generator expression
(x for var in expr)
,iter(expr)
is called when the expression is created.When using
def
andyield
to create a generator, as in:iter(expr)
is not yet called. It will be called only when iterating ong
(and might not be called at all).Taking this iterator as an example:
This code:
while:
Since most iterators do not do a lot of stuff in
__iter__
, it is easy to miss this behavior. A real world example would be Django'sQuerySet
, which fetch data in__iter__
anddata = (f(x) for x in qs)
might take a lot of time, whiledef g(): for x in qs: yield f(x)
followed bydata=g()
would return immediately.For more info and the formal definition refer to PEP 289 -- Generator Expressions.
There are only slight differences in the two. You can use the
dis
module to examine this sort of thing for yourself.Edit: My first version decompiled the generator expression created at module-scope in the interactive prompt. That's slightly different from the OP's version with it used inside a function. I've modified this to match the actual case in the question.
As you can see below, the "yield" generator (first case) has three extra instructions in the setup, but from the first
FOR_ITER
they differ in only one respect: the "yield" approach uses aLOAD_FAST
in place of aLOAD_DEREF
inside the loop. TheLOAD_DEREF
is "rather slower" thanLOAD_FAST
, so it makes the "yield" version slightly faster than the generator expression for large enough values ofx
(the outer loop) because the value ofy
is loaded slightly faster on each pass. For smaller values ofx
it would be slightly slower because of the extra overhead of the setup code.It might also be worth pointing out that the generator expression would usually be used inline in the code, rather than wrapping it with the function like that. That would remove a bit of the setup overhead and keep the generator expression slightly faster for smaller loop values even if
LOAD_FAST
gave the "yield" version an advantage otherwise.In neither case would the performance difference be enough to justify deciding between one or the other. Readability counts far more, so use whichever feels most readable for the situation at hand.
There is no difference for the kind of simple loops that you can fit into a generator expression. However yield can be used to create generators that do much more complex processing. Here is a simple example for generating the fibonacci sequence: