sys.getsizeof(list) returns less than the sum of i

2020-04-14 02:06发布

I'm curious - why does the sys.getsizeof call return a smaller number for a list than the sum of its elements?

import sys
lst = ["abcde", "fghij", "klmno", "pqrst", "uvwxy"]
print("Element sizes:", [sys.getsizeof(el) for el in lst])
print("Sum of sizes: ", sum([sys.getsizeof(el) for el in lst]))
print("Size of list: ", sys.getsizeof(lst))

The above prints

Element sizes: [42, 42, 42, 42, 42]
Sum of sizes:  210
Size of list:  112

How come?

标签: python list
3条回答
叛逆
2楼-- · 2020-04-14 02:37

You are getting the size of the actual list object. As the list object stores pointers to objects its memory size is bound to be different (and lower) than the sum of its elements.

By analogy, it’s like getting the size of an array of pointers in C.

查看更多
Explosion°爆炸
3楼-- · 2020-04-14 02:39

As per the documentation, sys.getsizeof does the following:

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

So only very primitive types in built-in objects are you ever really going to get accurate results. Even for built-in container types, you usually need to use some sort of recursive function to find the "total" size of the container (list, dictionary, etc). Keep in mind, though, that a python list is really just a re-sizable array of pointers, so in a sense, it is an accurate number.

However, you are looking for something like this:

https://code.activestate.com/recipes/577504/

Also, note that:

>>> sys.getsizeof(npArrayList[0])
96
>>> 

Every numpy object -or any object for that matter- has some overhead, and when you assign a np.array as a list element, you create a new object, so really, the following only takes into account the memory of the array contents, and not the overhead of the whole object:

>>> npArrayList[0].nbytes
32
查看更多
孤傲高冷的网名
4楼-- · 2020-04-14 02:43

The memory of a numpy array a can be obtained by a.nbytes.

sys.getsizeof shows "only the memory consumption directly attributed to the object [...], not the memory consumption of objects it refers to." (according to the documentation). In your case, it does not hold all the data. It can be seen with a.flags which outputs:

C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False

For the first array, it is instead:

C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False

The OWNDATA field being False explains why sys.getsizeof outputs only 128 bytes.

查看更多
登录 后发表回答