In Python, dictionaries created for the instances of a class are tiny compared to the dictionaries created containing the same attributes of that class:
import sys
class Foo(object):
def __init__(self, a, b):
self.a = a
self.b = b
f = Foo(20, 30)
When using Python 3.5.2, the following calls to getsizeof
produce:
>>> sys.getsizeof(vars(f)) # vars gets obj.__dict__
96
>>> sys.getsizeof(dict(vars(f))
288
288 - 96 = 192
bytes saved!
Using Python 2.7.12, though, on the other hand, the same calls return:
>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280
0
bytes saved.
In both cases, the dictionaries obviously have exactly the same contents:
>>> vars(f) == dict(vars(f))
True
so this isn't a factor. Also, this also applies to Python 3 only.
So, what's going on here? Why is the size of the __dict__
of an instance so tiny in Python 3?
In short:
Instance
__dict__
's are implemented differently than the 'normal' dictionaries created withdict
or{}
. The dictionaries of an instance share the keys and hashes and the keep a separate array for the parts that differ: the values.sys.getsizeof
only counts those values when calculating the size for the instance dict.A bit more:
Dictionaries in CPython are, as of Python 3.3, implemented in one of two forms:
me_value
member of thePyDictKeyEntry
struct). As far as I know, this form is used for dictionaries created withdict
,{}
and the module namespace.ma_values
ofPyDictObject
)Instance dictionaries are always implemented in a split-table form (a Key-Sharing Dictionary) which allows instances of a given class to share the keys (and hashes) for their
__dict__
and only differ in the corresponding values.This is all described in PEP 412 -- Key-Sharing Dictionary. The implementation for the split dictionary landed in Python
3.3
so, previous versions of the3
family as well as Python2.x
don't have this implementation.The implementation of
__sizeof__
for dictionaries takes this fact into account and only considers the size that corresponds to the values array when calculating the size for a split dictionary.It's thankfully, self-explanatory:
As far as I know, split-table dictionaries are created only for the namespace of instances, using
dict()
or{}
(as also described in the PEP) always results in a combined dictionary that doesn't have these benefits.As an aside, since it's fun, we can always break this optimization. There's two current ways I've currently found, a silly way or by a more sensible scenario:
Being silly:
Split tables only support string keys, adding a non-string key (which really makes zero sense) breaks this rule and CPython turns the split table into a combined one loosing all memory gains.
A scenario that might happen:
Different keys being inserted in the instances of a class will eventually lead to the split table getting combined. This doesn't apply only to the instances already created; all consequent instances created from the class will be have a combined dictionary instead of a split one.
of course, there's no good reason, other than for fun, for doing this on purpose.
If anyone is wondering, Python 3.6's dictionary implementation doesn't change this fact. The two aforementioned forms of dictionaries while still available are just further compacted (the implementation of
dict.__sizeof__
also changed, so some differences should come up in values returned fromgetsizeof
.)