-->

Measure Object Size Accurately in Python - Sys.Get

2019-01-08 00:29发布

问题:

I am trying to accurately/definitively find the size differences between two different classes in Python. They are both new style classes, save for one not having slots defined. I have tried numerous tests to determine their size difference, but they always end up being identical in memory usage.

So far I have tried sys.GetSizeOf(obj) and heapy's heap() function, with no positive results. Test code is below:

import sys
from guppy import hpy

class test3(object):
    def __init__(self):
        self.one = 1
        self.two = "two variable"

class test4(object):
    __slots__ = ('one', 'two')
    def __init__(self):
        self.one = 1
        self.two = "two variable"

test3_obj = test3()
print "Sizeof test3_obj", sys.getsizeof(test3_obj)

test4_obj = test4()
print "Sizeof test4_obj", sys.getsizeof(test4_obj)

arr_test3 = []
arr_test4 = []

for i in range(3000):
    arr_test3.append(test3())
    arr_test4.append(test4())

h = hpy()
print h.heap()

Output:

Sizeof test3_obj 32
Sizeof test4_obj 32

Partition of a set of 34717 objects. Total size = 2589028 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  11896  34   765040  30    765040  30 str
     1   3001   9   420140  16   1185180  46 dict of __main__.test3
     2   5573  16   225240   9   1410420  54 tuple
     3    348   1   167376   6   1577796  61 dict (no owner)
     4   1567   5   106556   4   1684352  65 types.CodeType
     5     68   0   105136   4   1789488  69 dict of module
     6    183   1    97428   4   1886916  73 dict of type
     7   3001   9    96032   4   1982948  77 __main__.test3
     8   3001   9    96032   4   2078980  80 __main__.test4
     9    203   1    90360   3   2169340  84 type
<99 more rows. Type e.g. '_.more' to view.>

This is all with Python 2.6.0. I also attempted to override the class's sizeof methods to try determine the size by summing the individual sizeofs but that didn't yield any different results:

class test4(object):
    __slots__ = ('one', 'two')
    def __init__(self):
        self.one = 1
        self.two = "two variable"
    def __sizeof__(self):
        return super(test4, self).__sizeof__() + self.one.__sizeof__() + self.two.__sizeof__()

Results with the sizeof method overridden:

Sizeof test3_obj 80
Sizeof test4_obj 80

回答1:

sys.getsizeof returns a number which is more specialized and less useful than people think. In fact, if you increase the number of attributes to six, your test3_obj remains at 32, but test4_obj jumps to 48 bytes. This is because getsizeof is returning the size of the PyObject structure implementing the type, which for test3_obj doesn't include the dict holding the attributes, but for test4_obj, the attributes aren't stored in a dict, they are stored in slots, so they are accounted for in the size.

But a class defined with __slots__ takes less memory than a class without, precisely because there is no dict to hold the attributes.

Why override __sizeof__? What are you really trying to accomplish?



回答2:

As others have stated, sys.getsizeof only returns the size of the object structure that represents your data. So if, for instance, you have a dynamic array that you keep adding elements to, sys.getsizeof(my_array) will only ever show the size of the base DynamicArray object, not the growing size of memory that its elements take up.

pympler.asizeof.asizeof() gives an approximate complete size of objects and may be more accurate for you.

from pympler import asizeof
asizeof.asizeof(my_object)  # should give you the full object size


回答3:

First check the size of the Pyton process in your os' memory manager without many objects.

Second make many objects of one kind and check the size again.

Third make many objects of the other kind and check the size.

Repeat this a few times and if the sizes of each step stay about the same you have got something comparable.



回答4:

I ran into a similar problem and ended up writing my own helper to do the dirty work. Check it out here



回答5:

You might want to use a different implementation for getting the size of your objects in memory:

>>> import sys, array
>>> sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set())))
>>> def explore(obj, memo):
    loc = id(obj)
    if loc not in memo:
        memo.add(loc)
        yield obj
        if isinstance(obj, memoryview):
            yield from explore(obj.obj, memo)
        elif not isinstance(obj, (range, str, bytes, bytearray, array.array)):
            # Handle instances with slots.
            try:
                slots = obj.__slots__
            except AttributeError:
                pass
            else:
                for name in slots:
                    try:
                        attr = getattr(obj, name)
                    except AttributeError:
                        pass
                    else:
                        yield from explore(attr, memo)
            # Handle instances with dict.
            try:
                attrs = obj.__dict__
            except AttributeError:
                pass
            else:
                yield from explore(attrs, memo)
            # Handle dicts or iterables.
            for name in 'keys', 'values', '__iter__':
                try:
                    attr = getattr(obj, name)
                except AttributeError:
                    pass
                else:
                    for item in attr():
                        yield from explore(item, memo)


>>> class Test1:
    def __init__(self):
        self.one = 1
        self.two = 'two variable'


>>> class Test2:
    __slots__ = 'one', 'two'
    def __init__(self):
        self.one = 1
        self.two = 'two variable'


>>> print('sizeof(Test1()) ==', sizeof(Test1()))
sizeof(Test1()) == 361
>>> print('sizeof(Test2()) ==', sizeof(Test2()))
sizeof(Test2()) == 145
>>> array_test1, array_test2 = [], []
>>> for _ in range(3000):
    array_test1.append(Test1())
    array_test2.append(Test2())


>>> print('sizeof(array_test1) ==', sizeof(array_test1))
sizeof(array_test1) == 530929
>>> print('sizeof(array_test2) ==', sizeof(array_test2))
sizeof(array_test2) == 194825
>>> 

Just make sure that you do not give any infinite iterators to this code if you want an answer back.