How to know bytes size of python object like array

2019-02-02 20:19发布

I was looking for a easy way to know bytes size of arrays and dictionaries object, like

[ [1,2,3], [4,5,6] ] or { 1:{2:2} }

Many topics say to use pylab, for example:

from pylab import *

A = array( [ [1,2,3], [4,5,6] ] )
A.nbytes
24

But, what about dictionaries? I saw lot of answers proposing to use pysize or heapy. An easy answer is given by Torsten Marek in this link: Which Python memory profiler is recommended?, but I haven't a clear interpretation about the output because the number of bytes didn't match.

Pysize seems to be more complicated and I haven't a clear idea about how to use it yet.

Given the simplicity of size calculation that I want to perform (no classes nor complex structures), any idea about a easy way to get a approximate estimation of memory usage of this kind of objects?

Kind regards.

4条回答
放荡不羁爱自由
2楼-- · 2019-02-02 20:35

There's:

>>> import sys
>>> sys.getsizeof([1,2, 3])
96
>>> a = []
>>> sys.getsizeof(a)
72
>>> a = [1]
>>> sys.getsizeof(a)
80

But I wouldn't say it's that reliable, as Python has overhead for each object, and there are objects that contain nothing but references to other objects, so it's not quite the same as in C and other languages.

Have a read of the docs on sys.getsizeof and go from there I guess.

查看更多
劳资没心,怎么记你
3楼-- · 2019-02-02 20:35

a bit late to the party but an easy way to get size of dict is to pickle it first.

Using sys.getsizeof on python object (including dictionary) may not be exact since it does not count referenced objects.

The way to handle it is to serialize it into a string and use sys.getsizeof on the string. Result will be much closer to what you want.

import cPickle

mydict = {'key1':'some long string, 'key2':[some, list], 'key3': whatever other data}

doing sys.getsizeof(mydict) is not exact so, pickle it first

mydict_as_string = cPickle.dumps(mydict)

now we can know how much space it takes by

print sys.getsizeof(mydict_as_string)
查看更多
戒情不戒烟
4楼-- · 2019-02-02 20:47

Use this recipe , taken from here:

http://code.activestate.com/recipes/577504-compute-memory-footprint-of-an-object-and-its-cont/

from __future__ import print_function
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
try:
    from reprlib import repr
except ImportError:
    pass

def total_size(o, handlers={}, verbose=False):
    """ Returns the approximate memory footprint an object and all of its contents.

    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:

        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}

    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {tuple: iter,
                    list: iter,
                    deque: iter,
                    dict: dict_handler,
                    set: iter,
                    frozenset: iter,
                   }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o), file=stderr)

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)


##### Example call #####

if __name__ == '__main__':
    d = dict(a=1, b=2, c=3, d=[4,5,6,7], e='a string of chars')
    print(total_size(d, verbose=True))
查看更多
爷、活的狠高调
5楼-- · 2019-02-02 20:57

None of the answers here are truly generic. The following solution will work with any type of object recursively, without the need for an expensive recursive implementation:

import gc
import sys

def get_obj_size(obj):
    marked = {id(obj)}
    obj_q = [obj]
    sz = 0

    while obj_q:
        sz += sum(map(sys.getsizeof, obj_q))

        # Lookup all the object reffered to by the object in obj_q.
        # See: https://docs.python.org/3.7/library/gc.html#gc.get_referents
        all_refr = ((id(o), o) for o in gc.get_referents(*obj_q))

        # Filter object that are already marked.
        # Using dict notation will prevent repeated objects.
        new_refr = {o_id: o for o_id, o in all_refr if o_id not in marked and not isinstance(o, type)}

        # The new obj_q will be the ones that were not marked,
        # and we will update marked with their ids so we will
        # not traverse them again.
        obj_q = new_refr.values()
        marked.update(new_refr.keys())

    return sz

For example:

>>> import numpy as np
>>> x = np.random.rand(1024).astype(np.float64)
>>> y = np.random.rand(1024).astype(np.float64)
>>> a = {'x': x, 'y': y}
>>> get_obj_size(a)
16816

See my repository for more information, or simply install my package (objsize):

$ pip install objsize

Then:

>>> from objsize import get_deep_size
>>> get_deep_size(a)
16816
查看更多
登录 后发表回答