In C, we can find the size of an int
, char
, etc. I want to know how to get size of objects like a string, integer, etc. in Python.
Related question: How many bytes per element are there in a Python list (tuple)?
I am using an XML file which contains size fields that specify the size of value. I must parse this XML and do my coding. When I want to change the value of a particular field, I will check the size field of that value. Here I want to compare whether the new value that I'm gong to enter is of the same size as in XML. I need to check the size of new value. In case of a string I can say its the length. But in case of int, float, etc. I am confused.
The answer, "Just use sys.getsizeof" is not a complete answer.
That answer does work for builtin objects directly, but it does not account for what those objects may contain, specifically, what types, such as tuples, lists, dicts, and sets contain. They can contain instances each other, as well as numbers, strings and other objects.
A More Complete Answer
Using 64 bit Python 3.6 from the Anaconda distribution, with sys.getsizeof, I have determined the minimum size of the following objects, and note that sets and dicts preallocate space so empty ones don't grow again until after a set amount (which may vary by implementation of the language):
Python 3:
How do you interpret this? Well say you have a set with 10 items in it. If each item is 100 bytes each, how big is the whole data structure? The set is 736 itself because it has sized up one time to 736 bytes. Then you add the size of the items, so that's 1736 bytes in total
Some caveats for function and class definitions:
Note each class definition has a proxy
__dict__
(48 bytes) structure for class attrs. Each slot has a descriptor (like aproperty
) in the class definition.Slotted instances start out with 48 bytes on their first element, and increase by 8 each additional. Only empty slotted objects have 16 bytes, and an instance with no data makes very little sense.
Also, each function definition has code objects, maybe docstrings, and other possible attributes, even a
__dict__
.Python 2.7 analysis, confirmed with
guppy.hpy
andsys.getsizeof
:Note that dictionaries (but not sets) got a more compact representation in Python 3.6
I think 8 bytes per additional item to reference makes a lot of sense on a 64 bit machine. Those 8 bytes point to the place in memory the contained item is at. The 4 bytes are fixed width for unicode in Python 2, if I recall correctly, but in Python 3, str becomes a unicode of width equal to the max width of the characters.
(And for more on slots, see this answer )
Recursive Visitor for a More Complete Function
To cover most of these types, I wrote this recursive function to try to estimate the size of most Python objects, including most builtins, types in the collections module, and custom types (slotted and otherwise):
And I tested it rather casually (I should unittest it):
It kind of breaks down on class definitions and function definitions because I don't go after all of their attributes, but since they should only exist once in memory for the process, their size really doesn't matter too much.
For numpy arrays,
getsizeof
doesn't work - for me it always returns 40 for some reason:Then (in ipython):
Happily, though:
Having run into this problem many times myself, I wrote up a small function (inspired by @aaron-hall's answer) & tests that does what I would have expected sys.getsizeof to do:
https://github.com/bosswissam/pysize
If you're interested in the backstory, here it is
EDIT: Attaching the code below for easy reference. To see the most up-to-date code, please check the github link.
First: an answer.
Discussion:
In Python, you cannot ever access "direct" memory addresses. Why, then, would you need or want to know how many such addresses are occupied by a given object?? It's a question that's entirely inappropriate at that level of abstraction. When you're painting your house, you don't ask what frequencies of light are absorbed or reflected by each of the constituent atoms within the paint, you just ask what color it is -- the details of the physical characteristics that create that color are beside the point. Similarly, the number of bytes of memory that a given Python object occupies is beside the point.
So, why are you trying to use Python to write C code? :)
This can be more complicated than it looks depending on how you want to count things. For instance, if you have a list of ints, do you want the size of the list containing the references to the ints? (ie. list only, not what is contained in it), or do you want to include the actual data pointed to, in which case you need to deal with duplicate references, and how to prevent double-counting when two objects contain references to the same object.
You may want to take a look at one of the python memory profilers, such as pysizer to see if they meet your needs.
Just use the sys.getsizeof function defined in the
sys
module.Usage example, in python 3.0:
If you are in python < 2.6 and don't have
sys.getsizeof
you can use this extensive module instead. Never used it though.