size of generator object in python

2019-02-19 12:53发布

For the following code:

import sys
x=(i for i in range(1,11))
print x


print 'Before starting iterating generator size is' ,sys.getsizeof(x)

print 'For first time'
for i in x:
    print i

print 'For second time , does not print anything'    
for i in x:
    print i # does not print anything

print 'After iterating generator size is' ,sys.getsizeof(x)

the output is:

<generator object <genexpr> at 0x014C1A80>
Before starting iterating generator size is 40
For first time
1
2
3
4
5
6
7
8
9
10
For second time
After iterating generator size is 40

The size of generator object at first is 40, when I finished with iterating it is still 40. But no element is referenced from the second loop.

Why does the generator object take the same memory when it was created and as it does when finished iterating over it?

2条回答
祖国的老花朵
2楼-- · 2019-02-19 13:00

The generator x is basically a function that will provide the next value of i whenever it is called. It doesn't calculate all values in advance. It waits until it is called and then it calculates and provides just the next value.

So each call will result in the next value.

Why doesn't the size of x change? Well, it's because x isn't a list of numbers. At the beginning and the end of the process, it's still the same function.

This is the advantage of using generators. You don't have to load everything into memory at the start (so it saves memory), and (if done correctly) you don't have to calculate anything until it's actually needed (so this can save computational time if some of the values aren't needed).

To see this:

x = (i for i in xrange(10**10))
for i in x:
    print i
    if i>10:
        break
print 'intermission'
for i in x:
    print i
    if i>20:
        break

(note the xrange, not range --- using range causes the advance calculation to happen). Imagine how long it would take to actually generate the integers from 0 to 10**10 and how much memory that would take. Compare how quickly this code runs.

查看更多
祖国的老花朵
3楼-- · 2019-02-19 13:13

The space a generator takes in memory is just bookkeeping info. In it a reference to the frame object is kept (administration for the running Python code, such as locals), wether or not it is running right now, and a reference to the code object are kept. Nothing more:

>>> x=(i for i in range(1,11))
>>> dir(x)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'next', 'send', 'throw']
>>> x.gi_frame
<frame object at 0x1053b4ad0>
>>> x.gi_running
0
>>> x.gi_code
<code object <genexpr> at 0x1051af5b0, file "<stdin>", line 1>

That's just 3 references, plus the usual Python object type info (think reference counting) and a weak-references list; so that's about 4 pointers, an integer and a struct, which on your system take 40 bytes (on my system, 64-bit OS X, it is 80 bytes). sys.getsizeof() reports on the size of just that structure as implemented in C, and it doesn't recurse over pointers.

As such, that amount of memory won't change when you have run through the generator. The referenced frame may change in how much memory is used (if the generator expression references large objects towards one end or the other) but you won't see that with the result of sys.getsizeof() on the generator object; look at the frame locals instead:

>>> next(x)
1
>>> x.gi_frame.f_locals
{'i': 1, '.0': <listiterator object at 0x105339dd0>}

The .0 object is the range() iterator that the generator is using in the for loop, i is the for loop target. The listiterator is another iterable object that has a private reference to the list range() produced as well as a position counter so it can yield the next element each time you ask it to.

You cannot query for an element size of a generator; they produce elements as needed anyway, you cannot a-priori 'know' how much they'll produce, nor know how much they have produced after running. sys.getsizeof() certainly won't tell you; it is a tool to measure memory footprint anyway, and you'd have to recursively measure all referenced objects if you want to know the total footprint.

You can see that the generator has completed its run from the frame; it is cleared once it is done:

>>> x.gi_frame
<frame object at 0x1053b4ad0>
>>> list(x)
[2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> x.gi_frame is None
True

So in the end, the memory used for the generator resides in structures in the frame (locals, and possibly globals, with each object in those namespaces possibly referencing other objects again), and when the generator is done the frame is cleared and the generator .gi_frame pointer is altered to point to the None singleton, leaving the frame to be cleared if the reference count has dropped to 0.

All this only applies to generators, not to iterables in general; generators are Python code and thus can be introspected this deeply.

查看更多
登录 后发表回答