I was analysing the following code, which compiles and runs correctly, but generates a memory leak.
The cfiboheap
is a C implementation of a Fibonacci Heap and the following code is a Cython wrapper (a part of it) for cfiboheap
.
My doubts starts on the insert function. The object data
has been created somewhere and passed to the function insert()
. Since the function wants to add this object to the fiboheap it increases its reference count. But afterwards? To whom the ownership goes? In my understanding, the C function fh_insertkey()
just borrows the ownership. Then it returns a proprietary pointer that needs to be incapsulated, and then returned by insert()
. Cool. But my object data
and its ref count? By creating the capsule I'm creating a new object, but I'm not decreasing the ref count of data
. This produces the memory leak.
(Note that commenting out Py_INCREF
or adding Py_DECREF
before the return of insert()
results in a segmentation fault.)
My questions are:
1) Why is it necessary to increment the ref count of data
during the insert()
?
2) Why is it not necessary to use a Py_DECREF
during the extract()
?
3) More generally, how can I exactly keep track of the reference ownership when jumping between C and Python?
4) How to properly deallocate an object like this FiboHeap? Should I use preventively a Py_XDECREF
in __dealloc__()
and, if yes, how?
Thanks!
cimport cfiboheap
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
from python_ref cimport Py_INCREF, Py_DECREF
cdef inline object convert_fibheap_el_to_pycapsule(cfiboheap.fibheap_el* element):
return PyCapsule_New(element, NULL, NULL)
cdef class FiboHeap:
def __cinit__(FiboHeap self):
self.treeptr = cfiboheap.fh_makekeyheap()
if self.treeptr is NULL:
raise MemoryError()
def __dealloc__(FiboHeap self):
if self.treeptr is not NULL:
cfiboheap.fh_deleteheap(self.treeptr)
cpdef object insert(FiboHeap self, double key, object data=None):
Py_INCREF(data)
cdef cfiboheap.fibheap_el* retValue = cfiboheap.fh_insertkey(self.treeptr, key, <void*>data)
if retValue is NULL:
raise MemoryError()
return convert_fibheap_el_to_pycapsule(retValue)
cpdef object extract(FiboHeap self):
cdef void* ret = cfiboheap.fh_extractmin(self.treeptr)
if ret is NULL:
raise IndexError("FiboHeap is empty")
return <object> ret
cpdef object decrease_key(FiboHeap self, object element, double newKey):
cdef void* ret = cfiboheap.fh_replacekey(self.treeptr, convert_pycapsule_to_fibheap_el(element), newKey)
if ret is NULL:
raise IndexError("New Key is Bigger")
return <object> ret
Note that this hasn't been written by me, but I'm using this example to better understand obj referencing and to stop the leak (since I am actually using the code).
The main code that makes use of FiboHeap
(and where the leak happens) looks like this:
cdef dijkstra(Graph G, int start_idx, int end_idx):
cdef np.ndarray[object, ndim=1] fiboheap_nodes = np.empty([G.num_nodes], dtype=object) # holds all of our FiboHeap Nodes Pointers
Q = FiboHeap()
fiboheap_nodes[start_idx] = Q.insert(0, start_idx)
# Then occasionally:
Q.insert(...)
Q.decrease_key(...)
Q.extract()
return
extract
is not a peek, but a proper pop, so it is deleting the C element in the C fiboheap.
In conclusion: it seems clear that the ref count of data
causes a memory leak, but why? And how to stop it?