What is a good way to find all of the references to an object in python?
The reason I ask is that it looks like we have a "memory leak". We are uploading image files to the server from a web browser. Each time we do this, the memory usage goes up on the server goes up proportionately to the size of the file that was just uploaded. This memory is never getting released by the python garbage collection, so I'm thinking that there are probably stray references pointing to the image data that are not getting deleted or going out of scope, even at the end of each request.
I figure it would be nice to be able to ask python: "What references are still pointing to this memory?" so that I can figure out what is keeping the garbage collection from freeing it.
Currently we are running Python and Django on a Heroku server.
Any suggestions and ideas are appreciated, thanks so much!
Python's standard library has gc
module containing garbage collector API. One of the function you possible want to have is
gc.get_objects()
This function returns list of all objects currently tracked by garbage collector. The next step is to analyze it.
If you know the object you want to track you can use sys
module's getrefcount
function:
>>> x = object()
>>> sys.getrefcount(x)
2
>>> y = x
>>> sys.getrefcount(x)
3
Python's gc
module has several useful functions, but it sounds like gc.get_referrers()
is what you're looking for. Here's an example:
import gc
def foo():
a = [2, 4, 6]
b = [1, 4, 7]
l = [a, b]
d = dict(a=a)
return l, d
l, d = foo()
r1 = gc.get_referrers(l[0])
r2 = gc.get_referrers(l[1])
print r1
print r2
When I run that, I see the following output:
[[[2, 4, 6], [1, 4, 7]], {'a': [2, 4, 6]}]
[[[2, 4, 6], [1, 4, 7]]]
You can see that the first line is l
and d
, and the second line is just l
.
In my brief experiments, I've found that the results are not always this clean. Interned strings and tuples, for example, have more referrers than you would expect.