This question is an extension of a question I asked earlier: Python Delegate Pattern - How to avoid circular reference? After reading the replies, I decided to clarify my question, but was requested to post it separately.
Here goes:
- A passage in the Python docs (reproduced below) states that garbaged collection is not guaranteed for circular
referenced objects. A post I found here suggests the same thing. But the replies to my earlier question disagrees. So, have I misunderstood the passage or are there further
details that I've missed?
- I suppose using a weak reference, as stated in Alex Martelli's reply to the question Should I worry about circular references in Python? would avoid the overhead for garbage collecting circular referenced objects mentioned in his reply entirely? If so how does it work?
The relevant Python docs that suggests conflicting following passage from Python's doc:
CPython implementation detail: CPython currently uses a
reference-counting scheme with (optional) delayed detection of
cyclically linked garbage, which collects most objects as soon as they
become unreachable, but is not guaranteed to collect garbage
containing circular references. See the documentation of the gc module
for information on controlling the collection of cyclic garbage. Other
implementations act differently and CPython may change. Do not depend
on immediate finalization of objects when they become unreachable (ex:
always close files).
The passage in its original form can be found here: http://docs.python.org/reference/datamodel.html The bold setting is mine.
Thanks in advance for any replies.
I believe that the most important reason that objects with circular references aren't guaranteed to be collected is that, by design, Python never collects objects with circular references if they define a __del__
method. There is a pretty straightforward reason:
Python doesn’t collect such cycles automatically because, in general, it isn’t possible for Python to guess a safe order in which to run the __del__()
methods.
I am loath to say that this is the only reason unreachable objects with circular references might not be detected. There are probably a few unusual situations that could foil the GC's cycle detection mechanisms. But unless you define __del__
for one of your objects, you're probably ok. Just don't worry about it, and use the GC's extensive debugging options if you see performance issues.
When it says it's not guaranteed to collect circular references, that's exactly what it means. Whenever you're data structures contain circular references, the reference count will always be non-zero, meaning that reference-counting alone is not enough to decide when to delete them. On the other hand, finding all circular references after reaching the end of each scope would be time consuming--to say the least. It would involve analyzing the relationships of all objects with a non-zero reference count.
That said, I don't expect that you'll have a problem with it, in general. For a light script, you can just ignore it. For others, you still have to do some clean up work at the end of scope (closing a file, or even removing the circular reference), as you would in C, for instance, but it's still not as heady as C.
If it becomes a problem, just remove the circular reference before you finish with each data object.