Confusing reference ownership: how to properly dea

2019-07-25 12:05发布

问题:

I was analysing the following code, which compiles and runs correctly, but generates a memory leak.

The cfiboheap is a C implementation of a Fibonacci Heap and the following code is a Cython wrapper (a part of it) for cfiboheap.

My doubts starts on the insert function. The object data has been created somewhere and passed to the function insert(). Since the function wants to add this object to the fiboheap it increases its reference count. But afterwards? To whom the ownership goes? In my understanding, the C function fh_insertkey() just borrows the ownership. Then it returns a proprietary pointer that needs to be incapsulated, and then returned by insert(). Cool. But my object data and its ref count? By creating the capsule I'm creating a new object, but I'm not decreasing the ref count of data. This produces the memory leak.

(Note that commenting out Py_INCREF or adding Py_DECREF before the return of insert() results in a segmentation fault.)

My questions are:

1) Why is it necessary to increment the ref count of data during the insert()?

2) Why is it not necessary to use a Py_DECREF during the extract()?

3) More generally, how can I exactly keep track of the reference ownership when jumping between C and Python?

4) How to properly deallocate an object like this FiboHeap? Should I use preventively a Py_XDECREF in __dealloc__() and, if yes, how?

Thanks!

cimport cfiboheap
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
from python_ref cimport Py_INCREF, Py_DECREF 

cdef inline object convert_fibheap_el_to_pycapsule(cfiboheap.fibheap_el* element):
    return PyCapsule_New(element, NULL, NULL)

cdef class FiboHeap:

    def __cinit__(FiboHeap self):
        self.treeptr = cfiboheap.fh_makekeyheap()
        if self.treeptr is NULL:
            raise MemoryError()

    def __dealloc__(FiboHeap self):
        if self.treeptr is not NULL:
            cfiboheap.fh_deleteheap(self.treeptr)

    cpdef object insert(FiboHeap self, double key, object data=None):
        Py_INCREF(data)
        cdef cfiboheap.fibheap_el* retValue = cfiboheap.fh_insertkey(self.treeptr, key, <void*>data)
        if retValue is NULL:
            raise MemoryError()

        return convert_fibheap_el_to_pycapsule(retValue)

    cpdef object extract(FiboHeap self):
        cdef void* ret = cfiboheap.fh_extractmin(self.treeptr)
        if ret is NULL:
            raise IndexError("FiboHeap is empty")

        return <object> ret

    cpdef object decrease_key(FiboHeap self,  object element, double newKey):
        cdef void* ret = cfiboheap.fh_replacekey(self.treeptr, convert_pycapsule_to_fibheap_el(element), newKey)
        if ret is NULL:
            raise IndexError("New Key is Bigger")

        return <object> ret 

Note that this hasn't been written by me, but I'm using this example to better understand obj referencing and to stop the leak (since I am actually using the code).

The main code that makes use of FiboHeap (and where the leak happens) looks like this:

cdef dijkstra(Graph G, int start_idx, int end_idx):

    cdef np.ndarray[object, ndim=1] fiboheap_nodes = np.empty([G.num_nodes], dtype=object) # holds all of our FiboHeap Nodes Pointers
    Q = FiboHeap()
    fiboheap_nodes[start_idx] = Q.insert(0, start_idx)
    # Then occasionally:
    Q.insert(...)
    Q.decrease_key(...)
    Q.extract()

    return

extract is not a peek, but a proper pop, so it is deleting the C element in the C fiboheap.

In conclusion: it seems clear that the ref count of data causes a memory leak, but why? And how to stop it?

回答1:

1) It is necessary to increase the reference count in insert because its reference count will be automatically decreased at the end of insert. Cython does not know you are storing the object for later. (You can inspect the generated C code to see the DECREF at the end of the function). If insert is called with an object of reference count 1 (i.e. .insert(SomeObject()), then the object would be destroyed at the end of insert without the INCREF

2) If the object is removed from the cfiboheap during extract then you should do a DECREF to acknowledge the fact that you no longer hold it. Cast it to object first (so you still hold a reference to it)

   cdef void* ret = cfiboheap.fh_extractmin(self.treeptr) # refcount is 1 here (from the INCREF when it was stored)
   if ret==NULL:
        # ...

   ret_obj = <object>ret
   # reference count should be 2 here - one for being on the heap and one for ret_obj. Casting to object increases the refcount in Cython
   Py_DECREF(ret_obj) # 1 here
   return ret_obj

3) Honestly you try not to use PyObject* if you can avoid it! It's much better to let Cython do the work. If you can't avoid it then you just need to ensure INCREF is called once when you store the object, and DECREF is called once when you stop storing it.

4) You do need to decref the remaining objects on the heap in __dealloc__. A very easy way to do that might be to all extract until the cfiboheap is empty:

try:
    while True:
        self.extract()
except IndexError:
    pass # ignore the error - we're done emptying the heap

A comment about the use of capsules: who owns the fibheap_el that they point to (and when does this get destructed)? If it gets destructed when the cfiboheap gets destructed then you have the issue of a capsule with an invalid pointer still being alive. Using this capsule somewhere might end up causing problems. If it doesn't get destructed by the cfiboheap then you potentially have another memory leak.