Python: different behavior using gc module in inte

2019-01-28 09:53发布

问题:

I wanted to be able to get a tuple of references to any existing object instances of a class. What I came up with was:

import gc

def instances(theClass):
    instances = []
    gc.collect()
    for i in gc.get_referrers(theClass):
        if isinstance(i, theClass):
            instances.append(i)
    return tuple(instances)

If the above code is entered at the Python intepreter prompt, then you can do the below:

>>> class MyClass(object):
>>>     pass

>>> c = MyClass()
>>> instances(MyClass)
(<__main__.MyClass object at 0x100c616d0>,)

Hooray. But then it seems as though gc.collect() doesn't actually do anything inside the function:

>>> del c
>>> instances(MyClass)
(<__main__.MyClass object at 0x100c616d0>,)

But gc.collect() works when outside the function:

>>> del c
>>> gc.collect()
>>> instances(MyClass)
()

So, my question is: How do I make gc.collect() actually do a full collection when inside the function (and why doesn't it work as is)? With corollary question: Is there a better way to accomplish the same goal of returning a tuple with references to object instances for a specific class?

NB: This was all tried in Python 2.7.3. I haven't yet tried it in Python 3, but my goal would be to have a function that works in either (or at least which can be converted with 2to3).

Edited (per answer below) to clarify that the issue was really about interactive mode, not the gc.collect() function per se.

回答1:

When you work in interactive mode, there's a magic built-in variable _ that holds the result of the last expression statement you ran:

>>> 3 + 4
7
>>> _
7

When you delete the c variable, del c isn't an expression, so _ is unchanged:

>>> c = MyClass()
>>> instances(MyClass)
(<__main__.MyClass object at 0x00000000022E1748>,)
>>> del c
>>> _
(<__main__.MyClass object at 0x00000000022E1748>,)

_ is keeping a reference to the MyClass instance. When you call gc.collect(), that's an expression, so the return value of gc.collect() replaces the old value of _, and c finally gets collected. It doesn't have anything to do with the garbage collector; any expression would do:

>>> 4
4
>>> instances(MyClass)
()


回答2:

I think there's a simpler, more reliable way to get the information you want, without rummaging around in gc: you can make the class responsible for keeping track of its instances.

Here I'm using a metaclass to attach a list of instances to each subclass of InstanceTracker, and overriding __new__ to add each created instance to the list. (This is Python 3 code, it'll need adapting a bit to work with Python 2.)

class InstanceTrackerMeta(type):
    def __new__(meta, name, bases, dct):
        cls = super().__new__(meta, name, bases, dct)
        cls.instances = []
        return cls

class InstanceTracker(metaclass=InstanceTrackerMeta):
    def __new__(cls, *args, **kwargs):
        instance = super().__new__(cls, *args, **kwargs)
        cls.instances.append(instance)
        return instance


# subclass InstanceTracker to get a class which remembers its instances
class MyClass(InstanceTracker):
    pass

c = MyClass()
print(MyClass.instances)
# [<__main__.MyClass object at 0x107b9d9b0>]

Notes: This code might need tweaking, depending on whether you want to keep track of instances of subclasses and so on. If you want instances to be removed when they are garbage-collected, you need to override __del__ in InstanceTracker. You may also be able to simplify it to get rid of the metaclass, if you only need to track instances of one of the classes in your system.