For the following code:
import sys
x=(i for i in range(1,11))
print x
print 'Before starting iterating generator size is' ,sys.getsizeof(x)
print 'For first time'
for i in x:
print i
print 'For second time , does not print anything'
for i in x:
print i # does not print anything
print 'After iterating generator size is' ,sys.getsizeof(x)
the output is:
<generator object <genexpr> at 0x014C1A80>
Before starting iterating generator size is 40
For first time
1
2
3
4
5
6
7
8
9
10
For second time
After iterating generator size is 40
The size of generator object at first is 40, when I finished with iterating it is still 40. But no element is referenced from the second loop.
Why does the generator object take the same memory when it was created and as it does when finished iterating over it?
The generator
x
is basically a function that will provide the next value ofi
whenever it is called. It doesn't calculate all values in advance. It waits until it is called and then it calculates and provides just the next value.So each call will result in the next value.
Why doesn't the size of
x
change? Well, it's becausex
isn't a list of numbers. At the beginning and the end of the process, it's still the same function.This is the advantage of using generators. You don't have to load everything into memory at the start (so it saves memory), and (if done correctly) you don't have to calculate anything until it's actually needed (so this can save computational time if some of the values aren't needed).
To see this:
(note the
xrange
, notrange
--- usingrange
causes the advance calculation to happen). Imagine how long it would take to actually generate the integers from 0 to10**10
and how much memory that would take. Compare how quickly this code runs.The space a generator takes in memory is just bookkeeping info. In it a reference to the frame object is kept (administration for the running Python code, such as locals), wether or not it is running right now, and a reference to the code object are kept. Nothing more:
That's just 3 references, plus the usual Python object type info (think reference counting) and a weak-references list; so that's about 4 pointers, an integer and a struct, which on your system take 40 bytes (on my system, 64-bit OS X, it is 80 bytes).
sys.getsizeof()
reports on the size of just that structure as implemented in C, and it doesn't recurse over pointers.As such, that amount of memory won't change when you have run through the generator. The referenced frame may change in how much memory is used (if the generator expression references large objects towards one end or the other) but you won't see that with the result of
sys.getsizeof()
on the generator object; look at the frame locals instead:The
.0
object is therange()
iterator that the generator is using in thefor
loop,i
is thefor
loop target. Thelistiterator
is another iterable object that has a private reference to the listrange()
produced as well as a position counter so it can yield the next element each time you ask it to.You cannot query for an element size of a generator; they produce elements as needed anyway, you cannot a-priori 'know' how much they'll produce, nor know how much they have produced after running.
sys.getsizeof()
certainly won't tell you; it is a tool to measure memory footprint anyway, and you'd have to recursively measure all referenced objects if you want to know the total footprint.You can see that the generator has completed its run from the frame; it is cleared once it is done:
So in the end, the memory used for the generator resides in structures in the frame (locals, and possibly globals, with each object in those namespaces possibly referencing other objects again), and when the generator is done the frame is cleared and the generator
.gi_frame
pointer is altered to point to theNone
singleton, leaving the frame to be cleared if the reference count has dropped to 0.All this only applies to generators, not to iterables in general; generators are Python code and thus can be introspected this deeply.