I was analysing the following code, which compiles and runs correctly, but generates a memory leak.
The cfiboheap
is a C implementation of a Fibonacci Heap and the following code is a Cython wrapper (a part of it) for cfiboheap
.
My doubts starts on the insert function. The object data
has been created somewhere and passed to the function insert()
. Since the function wants to add this object to the fiboheap it increases its reference count. But afterwards? To whom the ownership goes? In my understanding, the C function fh_insertkey()
just borrows the ownership. Then it returns a proprietary pointer that needs to be incapsulated, and then returned by insert()
. Cool. But my object data
and its ref count? By creating the capsule I'm creating a new object, but I'm not decreasing the ref count of data
. This produces the memory leak.
(Note that commenting out Py_INCREF
or adding Py_DECREF
before the return of insert()
results in a segmentation fault.)
My questions are:
1) Why is it necessary to increment the ref count of data
during the insert()
?
2) Why is it not necessary to use a Py_DECREF
during the extract()
?
3) More generally, how can I exactly keep track of the reference ownership when jumping between C and Python?
4) How to properly deallocate an object like this FiboHeap? Should I use preventively a Py_XDECREF
in __dealloc__()
and, if yes, how?
Thanks!
cimport cfiboheap
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
from python_ref cimport Py_INCREF, Py_DECREF
cdef inline object convert_fibheap_el_to_pycapsule(cfiboheap.fibheap_el* element):
return PyCapsule_New(element, NULL, NULL)
cdef class FiboHeap:
def __cinit__(FiboHeap self):
self.treeptr = cfiboheap.fh_makekeyheap()
if self.treeptr is NULL:
raise MemoryError()
def __dealloc__(FiboHeap self):
if self.treeptr is not NULL:
cfiboheap.fh_deleteheap(self.treeptr)
cpdef object insert(FiboHeap self, double key, object data=None):
Py_INCREF(data)
cdef cfiboheap.fibheap_el* retValue = cfiboheap.fh_insertkey(self.treeptr, key, <void*>data)
if retValue is NULL:
raise MemoryError()
return convert_fibheap_el_to_pycapsule(retValue)
cpdef object extract(FiboHeap self):
cdef void* ret = cfiboheap.fh_extractmin(self.treeptr)
if ret is NULL:
raise IndexError("FiboHeap is empty")
return <object> ret
cpdef object decrease_key(FiboHeap self, object element, double newKey):
cdef void* ret = cfiboheap.fh_replacekey(self.treeptr, convert_pycapsule_to_fibheap_el(element), newKey)
if ret is NULL:
raise IndexError("New Key is Bigger")
return <object> ret
Note that this hasn't been written by me, but I'm using this example to better understand obj referencing and to stop the leak (since I am actually using the code).
The main code that makes use of FiboHeap
(and where the leak happens) looks like this:
cdef dijkstra(Graph G, int start_idx, int end_idx):
cdef np.ndarray[object, ndim=1] fiboheap_nodes = np.empty([G.num_nodes], dtype=object) # holds all of our FiboHeap Nodes Pointers
Q = FiboHeap()
fiboheap_nodes[start_idx] = Q.insert(0, start_idx)
# Then occasionally:
Q.insert(...)
Q.decrease_key(...)
Q.extract()
return
extract
is not a peek, but a proper pop, so it is deleting the C element in the C fiboheap.
In conclusion: it seems clear that the ref count of data
causes a memory leak, but why? And how to stop it?
1) It is necessary to increase the reference count in
insert
because its reference count will be automatically decreased at the end of insert. Cython does not know you are storing the object for later. (You can inspect the generated C code to see theDECREF
at the end of the function). Ifinsert
is called with an object of reference count 1 (i.e..insert(SomeObject())
, then the object would be destroyed at the end of insert without theINCREF
2) If the object is removed from the
cfiboheap
duringextract
then you should do aDECREF
to acknowledge the fact that you no longer hold it. Cast it to object first (so you still hold a reference to it)3) Honestly you try not to use
PyObject*
if you can avoid it! It's much better to let Cython do the work. If you can't avoid it then you just need to ensureINCREF
is called once when you store the object, andDECREF
is called once when you stop storing it.4) You do need to decref the remaining objects on the heap in
__dealloc__
. A very easy way to do that might be to allextract
until thecfiboheap
is empty:A comment about the use of capsules: who owns the
fibheap_el
that they point to (and when does this get destructed)? If it gets destructed when thecfiboheap
gets destructed then you have the issue of a capsule with an invalid pointer still being alive. Using this capsule somewhere might end up causing problems. If it doesn't get destructed by thecfiboheap
then you potentially have another memory leak.