Say I have a class A:
class A(object):
def __init__(self, x):
self.x = x
def __str__(self):
return self.x
And I use sys.getsizeof
to see how many bytes instance of A
takes:
>>> sys.getsizeof(A(1))
64
>>> sys.getsizeof(A('a'))
64
>>> sys.getsizeof(A('aaa'))
64
As illustrated in the experiment above, the size of an A
object is the same no matter what self.x
is.
So I wonder how python store an object internally?
in the case of a new class instance getsizeof() return the size of a reference to PyObject which is returned by the C function PyInstance_New()
if you want a list of all the object size check this.
It depends on what kind of object, and also which Python implementation :-)
In CPython, which is what most people use when they use
python
, all Python objects are represented by a C struct,PyObject
. Everything that 'stores an object' really stores aPyObject *
. ThePyObject
struct holds the bare minimum information: the object's type (a pointer to anotherPyObject
) and its reference count (anssize_t
-sized integer.) Types defined in C extend this struct with extra information they need to store in the object itself, and sometimes allocate extra data separately.For example, tuples (implemented as a
PyTupleObject
"extending" a PyObject struct) store their length and thePyObject
pointers they contain inside the struct itself (the struct contains a 1-length array in the definition, but the implementation allocates a block of memory of the right size to hold thePyTupleObject
struct plus exactly as many items as the tuple should hold.) The same way, strings (PyStringObject
) store their length, their cached hashvalue, some string-caching ("interning") bookkeeping, and the actual char* of their data. Tuples and strings are thus single blocks of memory.On the other hand, lists (
PyListObject
) store their length, aPyObject **
for their data and anotherssize_t
to keep track of how much room they allocated for the data. Because Python storesPyObject
pointers everywhere, you can't grow a PyObject struct once it's allocated -- doing so may require the struct to move, which would mean finding all pointers and updating them. Because a list may need to grow, it has to allocate the data separately from the PyObject struct. Tuples and strings cannot grow, and so they don't need this. Dicts (PyDictObject
) work the same way, although they store the key, the value and the cached hashvalue of the key, instead of just the items. Dict also have some extra overhead to accommodate small dicts and specialized lookup functions.But these are all types in C, and you can usually see how much memory they would use just by looking at the C source. Instances of classes defined in Python rather than C are not so easy. The simplest case, instances of classic classes, is not so difficult: it's a
PyObject
that stores aPyObject *
to its class (which is not the same thing as the type stored in thePyObject
struct already), aPyObject *
to its__dict__
attribute (which holds all other instance attributes) and aPyObject *
to its weakreflist (which is used by theweakref
module, and only initialized if necessary.) The instance's__dict__
is usually unique to the instance, so when calculating the "memory size" of such an instance you usually want to count the size of the attribute dict as well. But it doesn't have to be specific to the instance!__dict__
can be assigned to just fine.New-style classes complicate manners. Unlike with classic classes, instances of new-style classes are not separate C types, so they do not need to store the object's class separately. They do have room for the
__dict__
and weakreflist reference, but unlike classic instances they don't require the__dict__
attribute for arbitrary attributes. if the class (and all its baseclasses) use__slots__
to define a strict set of attributes, and none of those attributes is named__dict__
, the instance does not allow arbitrary attributes and no dict is allocated. On the other hand, attributes defined by__slots__
have to be stored somewhere. This is done by storing thePyObject
pointers for the values of those attributes directly in the PyObject struct, much like is done with types written in C. Each entry in__slots__
will thus take up aPyObject *
, regardless of whether the attribute is set or not.All that said, the problem remains that since everything in Python is an object and everything that holds an object just holds a reference, it's sometimes very difficult to draw the line between objects. Two objects can refer to the same bit of data. They may hold the only two references to that data. Getting rid of both objects also gets rid of the data. Do they both own the data? Does only one of them, but if so, which one? Or would you say they own half the data, even though getting rid of one object doesn't release half the data? Weakrefs can make this even more complicated: two objects can refer to the same data, but deleting one of the objects may cause the other object to also get rid of its reference to that data, causing the data to be cleaned up after all.
Fortunately the common case is fairly easy to figure out. There are memory debuggers for Python that do a reasonable job at keeping track of these things, like heapy. And as long as your class (and its baseclasses) is reasonably simple, you can make an educated guess at how much memory it would take up -- especially in large numbers. If you really want to know the exact sizes of your datastructures, consult the CPython source; most builtin types are simple structs described in
Include/<type>object.h
and implemented inObjects/<type>object.c
. The PyObject struct itself is described inInclude/object.h
. Just keep in mind: it's pointers all the way down; those take up room too.