I am trying to accurately/definitively find the size differences between two different classes in Python. They are both new style classes, save for one not having slots defined. I have tried numerous tests to determine their size difference, but they always end up being identical in memory usage.
So far I have tried sys.GetSizeOf(obj) and heapy's heap() function, with no positive results. Test code is below:
import sys
from guppy import hpy
class test3(object):
def __init__(self):
self.one = 1
self.two = "two variable"
class test4(object):
__slots__ = ('one', 'two')
def __init__(self):
self.one = 1
self.two = "two variable"
test3_obj = test3()
print "Sizeof test3_obj", sys.getsizeof(test3_obj)
test4_obj = test4()
print "Sizeof test4_obj", sys.getsizeof(test4_obj)
arr_test3 = []
arr_test4 = []
for i in range(3000):
arr_test3.append(test3())
arr_test4.append(test4())
h = hpy()
print h.heap()
Output:
Sizeof test3_obj 32
Sizeof test4_obj 32
Partition of a set of 34717 objects. Total size = 2589028 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 11896 34 765040 30 765040 30 str
1 3001 9 420140 16 1185180 46 dict of __main__.test3
2 5573 16 225240 9 1410420 54 tuple
3 348 1 167376 6 1577796 61 dict (no owner)
4 1567 5 106556 4 1684352 65 types.CodeType
5 68 0 105136 4 1789488 69 dict of module
6 183 1 97428 4 1886916 73 dict of type
7 3001 9 96032 4 1982948 77 __main__.test3
8 3001 9 96032 4 2078980 80 __main__.test4
9 203 1 90360 3 2169340 84 type
<99 more rows. Type e.g. '_.more' to view.>
This is all with Python 2.6.0. I also attempted to override the class's sizeof methods to try determine the size by summing the individual sizeofs but that didn't yield any different results:
class test4(object):
__slots__ = ('one', 'two')
def __init__(self):
self.one = 1
self.two = "two variable"
def __sizeof__(self):
return super(test4, self).__sizeof__() + self.one.__sizeof__() + self.two.__sizeof__()
Results with the sizeof method overridden:
Sizeof test3_obj 80
Sizeof test4_obj 80
sys.getsizeof
returns a number which is more specialized and less useful than people think. In fact, if you increase the number of attributes to six, your test3_obj remains at 32, but test4_obj jumps to 48 bytes. This is because getsizeof is returning the size of the PyObject structure implementing the type, which for test3_obj doesn't include the dict holding the attributes, but for test4_obj, the attributes aren't stored in a dict, they are stored in slots, so they are accounted for in the size.
But a class defined with __slots__
takes less memory than a class without, precisely because there is no dict to hold the attributes.
Why override __sizeof__
? What are you really trying to accomplish?
As others have stated, sys.getsizeof
only returns the size of the object structure that represents your data. So if, for instance, you have a dynamic array that you keep adding elements to, sys.getsizeof(my_array)
will only ever show the size of the base DynamicArray
object, not the growing size of memory that its elements take up.
pympler.asizeof.asizeof()
gives an approximate complete size of objects and may be more accurate for you.
from pympler import asizeof
asizeof.asizeof(my_object) # should give you the full object size
First check the size of the Pyton process in your os' memory manager without many objects.
Second make many objects of one kind and check the size again.
Third make many objects of the other kind and check the size.
Repeat this a few times and if the sizes of each step stay about the same you have got something comparable.
I ran into a similar problem and ended up writing my own helper to do the dirty work. Check it out here
You might want to use a different implementation for getting the size of your objects in memory:
>>> import sys, array
>>> sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set())))
>>> def explore(obj, memo):
loc = id(obj)
if loc not in memo:
memo.add(loc)
yield obj
if isinstance(obj, memoryview):
yield from explore(obj.obj, memo)
elif not isinstance(obj, (range, str, bytes, bytearray, array.array)):
# Handle instances with slots.
try:
slots = obj.__slots__
except AttributeError:
pass
else:
for name in slots:
try:
attr = getattr(obj, name)
except AttributeError:
pass
else:
yield from explore(attr, memo)
# Handle instances with dict.
try:
attrs = obj.__dict__
except AttributeError:
pass
else:
yield from explore(attrs, memo)
# Handle dicts or iterables.
for name in 'keys', 'values', '__iter__':
try:
attr = getattr(obj, name)
except AttributeError:
pass
else:
for item in attr():
yield from explore(item, memo)
>>> class Test1:
def __init__(self):
self.one = 1
self.two = 'two variable'
>>> class Test2:
__slots__ = 'one', 'two'
def __init__(self):
self.one = 1
self.two = 'two variable'
>>> print('sizeof(Test1()) ==', sizeof(Test1()))
sizeof(Test1()) == 361
>>> print('sizeof(Test2()) ==', sizeof(Test2()))
sizeof(Test2()) == 145
>>> array_test1, array_test2 = [], []
>>> for _ in range(3000):
array_test1.append(Test1())
array_test2.append(Test2())
>>> print('sizeof(array_test1) ==', sizeof(array_test1))
sizeof(array_test1) == 530929
>>> print('sizeof(array_test2) ==', sizeof(array_test2))
sizeof(array_test2) == 194825
>>>
Just make sure that you do not give any infinite iterators to this code if you want an answer back.