First of all this is my computer Spec :
Memory - https://gist.github.com/vyscond/6425304
CPU - https://gist.github.com/vyscond/6425322
So this morning I've tested the following 2 code snippets:
code A
a = 'a' * 1000000000
and code B
a = 'a' * 10000000000
The code A works fine. But the code B give me some error message :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
So I started a researching about method to measuring the size of data on python.
The first thing I've found is the classic built-in function len()
.
for code A function len()
returned the value 1000000000
, but for code B the same memory error was returned.
After this I decided to get more precision on this tests. So I've found a function from the sys
module called getsizeof()
. With this function I made the same test on code A:
sys.getsizeof( 'a' * 1000000000 )
the result return is 1000000037
(in bytes)
- question 1 - which means
0.9313226090744
gigabytes?
So I checked the amount of bytes of a string with a single character 'a'
sys.getsizeof( 'a' )
the result return is 38
(in bytes)
question 02 - which means if we need a string composed of 1000000000 character
'a'
this will result in 38 * 1000000000 = 38.000.000.000 bytes?question 03 - which means we need a 35.390257835388 gigabytes to hold a string like this?
I would like to know where is the error in this reasoning! Because this not any sense to me '-'
Python objects have a minimal size, the overhead of keeping several pieces of bookkeeping data attached to the object.
A Python
str
object is no exception. Take a look at the difference between a string with no, one, two and three characters:The Python
str
object overhead is 37 bytes on my machine, but each character in the string only takes one byte over the fixed overhead.Thus, a
str
value with 1000 million characters requires 1000 million bytes + 37 bytes overhead of memory. That is indeed about 0.931 gigabytes.Your sample code 'B' created ten times more characters, so you needed nearly 10 gigabyte of memory just to hold that one string, not counting the rest of Python, and the OS and whatever else might be running on that machine.