Consider the following script:
l = [i for i in range(int(1e8))]
l = []
import gc
gc.collect()
# 0
gc.get_referrers(l)
# [{'__builtins__': <module '__builtin__' (built-in)>, 'l': [], '__package__': None, 'i': 99999999, 'gc': <module 'gc' (built-in)>, '__name__': '__main__', '__doc__': None}]
del l
gc.collect()
# 0
The point is, after all these steps the memory usage of this python process is around 30 % on my machine (Python 2.6.5, any more details on request?). Here's an excerpt of the output of top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5478 moooeeeep 20 0 2397m 2.3g 3428 S 0 29.8 0:09.15 ipython
resp. ps aux
:
moooeeeep 5478 1.0 29.7 2454720 2413516 pts/2 S+ 12:39 0:09 /usr/bin/python /usr/bin/ipython gctest.py
According to the docs for gc.collect
:
Not all items in some free lists may be freed due to the particular implementation, in particular
int
andfloat
.
Does this mean, if I (temporarily) need a large number of different int
or float
numbers, I need to export this to C/C++ because the Python GC fails to release the memory?
Update
Probably the interpreter is to blame, as this article suggests:
It’s that you’ve created 5 million integers simultaneously alive, and each int object consumes 12 bytes. “For speed”, Python maintains an internal free list for integer objects. Unfortunately, that free list is both immortal and unbounded in size. floats also use an immortal & unbounded free list.
The problem however remains, as I cannot avoid this amount of data (timestamp/value pairs from an external source). Am I really forced to drop Python and go back to C/C++ ?
Update 2
Probably it's indeed the case, that the Python implementation causes the problem. Found this answer conclusively explaining the issue and a possible workaround.
Your answer may be here:
I've done a few tests, and this issue only occurs with CPython 2.x. The issue is gone in CPython 3.2.2 (it drops back to the memory usage of a fresh interpreter) and PyPy 1.8 (python 2.7.2) also drops back down to the same level as a new pypy process.
So no, you don't need to switch to another language. However, there's likely a solution which won't force you to switch to a different Python implementation.
Found this also to be answered by Alex Martelli in another thread.
Fortunately I was able to split the memory intensive work into separate chunks that enabled the interpreter to actually free the temporary memory after each iteration . I used the following wrapper to run the memory intensive function as a subprocess:
Python tends to do garbage collection fairly intelligently, and in my experience release memory just fine. It does have a small overhead to take into account (about 15Mb on mine), but beyond that the memory requirements are not that different from C. If you are dealing with so much data that memory is a serious problem you're probably going to have the same problem in C, so it would be far better to try to change the way you work with your data, for example store it in a pagefile and work with manageable chucks one at a time.