Problem

Void pointer to a cdef class is pointing to the same memory address without forcing the the python reference counter.

Description

I have a simple class that I want to store in a cpp vector by casting it to a void pointer. However, after printing the memory addresses the pointer is pointing to, it repeats after the second iteration, unless I force the reference counter to be increased by adding the new object to a list. Can somebody why the memory loops back without the reference counter enforcement?

# distutils: language = c++
# distutils: extra_compile_args = -std=c++11
from libcpp.vector cimport vector
from libc.stdio cimport printf

cdef class Temp:
    cdef int a
    def __init__(self, a):
        self.a = a


def f():
    cdef vector[void *] vec
    cdef int i, n = 3
    cdef Temp tmp
    cdef list ids = []
    # cdef list classes  = [] # force reference counter?
    for i in range(n):
        tmp = Temp(1)
        # classes.append(tmp)
        vec.push_back(<void *> tmp)
        printf('%p ', <void *> tmp)
        ids.append(id(tmp))
    print(ids)
f()

Which outputs:

[140137023037824, 140137023037848, 140137023037824]

However if I force the reference counter by adding it to the classes list:

[140663518040448, 140663518040472, 140663518040496]

回答1:

This answer became quite long, so there is a quick overview of the content:

Explanation of the observed behavior
Naive approach to avoid the problem
A more systematic and c++-typical solution
Illustrating problem for multi-threaded code in "nogil"-mode
Extending c++-typical solution for nogil-mode

Explanation of the observed behavior

The deal with Cython: as long as your variables are of type object or inherit from it (in your case cdef Temp) cython manages the reference counting for you. As soon as you cast it to PyObject * or any other pointer - the reference counting is your responsibility.

Obviously, the only reference to the created object is the variable tmp, as soon as you rebind it to the newly created Temp-object, the reference-counter of the old object becomes 0 and it is destroyed - the pointers in the vector becomes dangling. However, the same memory can be reused (it is quite probably) and thus you see always the same reused address.

Naive solution

How could you do the reference counting? For example (I use rather PyObject * than void *):

...
from cpython cimport PyObject,Py_XINCREF, Py_XDECREF    
...
def f():
    cdef vector[PyObject *] vec
    cdef int i, n = 3
    cdef Temp tmp
    cdef PyObject *tmp_ptr
    cdef list ids = []
    for i in range(n):
        tmp = Temp(1)
        tmp_ptr = <PyObject *> tmp
        Py_XINCREF(tmp_ptr)   # ensure it is not destroyed
        vec.push_back(tmp_ptr)
        printf('%p ', tmp_ptr)
        ids.append(id(tmp))

    #free memory:
    for i in range(n):
        Py_XDECREF(vec.at(i))
    print(ids)

Now all objects stay alive and "die" only after Py_XDECREF is called explicitly.

C++-typical solution

The above is not a very typical c++-way of doing things, I would rather introduce a wrapper which manages the reference counting automatically (not unlike std::shared_ptr):

...
cdef extern from *:
    """
    #include <Python.h>
    class PyObjectHolder{
    public:
        PyObject *ptr;
        PyObjectHolder():ptr(nullptr){}
        PyObjectHolder(PyObject *o):ptr(o){
           Py_XINCREF(ptr);
        }
        //rule of 3
        ~PyObjectHolder(){
            Py_XDECREF(ptr);
        }
        PyObjectHolder(const PyObjectHolder &h):
            PyObjectHolder(h.ptr){}
        PyObjectHolder& operator=(const PyObjectHolder &other){
            Py_XDECREF(ptr);
            ptr=other.ptr;
            Py_XINCREF(ptr);
            return *this;
        }
    };
    """
    cdef cppclass PyObjectHolder:
        PyObjectHolder(PyObject *o)

...
def f():
    cdef vector[PyObjectHolder] vec
    cdef int i, n = 3
    cdef Temp tmp
    cdef PyObject *tmp_ptr
    cdef list ids = []
    for i in range(n):
        tmp = Temp(1)
        vec.push_back(PyObjectHolder(<PyObject *> tmp)) # vector::emplace_back is missing in Cython-wrappers
        printf('%p ', <PyObject *> tmp)
        ids.append(id(tmp))
   print(ids) 
   # PyObjectHolder automatically decreases ref-counter as soon 
   # vec is out of scope, no need to take additional care

Noteworthy things:

PyObjectHolder increases ref-counter as soon as it take possession of a PyObject-pointer and decreases it as soon as it releases the pointer.
Rule of three means we also have to take care in copy-constructor and assignment operator
I've omitted move-stuff for c++11, but you need to take care of it as well.

Problems with nogil-mode

There is however one very important thing: You shouldn't release GIL with the above implementation (i.e. import it as PyObjectHolder(PyObject *o) nogil but there are also problems when C++ copies the vectors and similar) - because otherwise Py_XINCREF and Py_XDECREF might not work correctly.

To illustrate that let's take a look at the following code, which releases gil and does some stupid calculations in parallel (the whole magic cell is in listings at the end of the answer):

%%cython --cplus -c=/openmp 
...
# importing as nogil - A BAD THING
cdef cppclass PyObjectHolder:
    PyObjectHolder(PyObject *o) nogil

# some functionality using a lot of incref/decref  
cdef int create_vectors(PyObject *o) nogil:
    cdef vector[PyObjectHolder] vec
    cdef int i
    for i in range(100):
        vec.push_back(PyObjectHolder(o))
    return vec.size()

# using PyObjectHolder without gil - A BAD THING
def run(object o):
    cdef PyObject *ptr=<PyObject*>o;
    cdef int i
    for i in prange(10, nogil=True):
        create_vectors(ptr)

And now:

import sys
a=[1000]*1000
print("Starts with", sys.getrefcount(a[0]))
# prints: Starts with 1002
run(a[0])
print("Ends with", sys.getrefcount(a[0]))
#prints: Ends with 1177

We got lucky, the program didn't crash (but could!). However due to race conditions, we ended up with memory leak - a[0] has reference count of 1177 but there are only 1000 references(+2 inside of sys.getrefcount) references alive, so this object will never be destroyed.

Making PyObjectHolder thread-safe

So what to do? The simplest solution is to use a mutex to protect the accesses to ref-counter(i.e. every time Py_XINCREF or Py_XDECREF is called). The downside of this approach is that it might slowdown the single core code considerable (see for example this old article about an older try to replace GIL by mutex-similar approach).

Here is a prototype:

%%cython --cplus -c=/openmp 
...
cdef extern from *:
    """
    #include <Python.h>
    #include <mutex>

    std::mutex ref_mutex;

    class PyObjectHolder{
    public:
        PyObject *ptr;
        PyObjectHolder():ptr(nullptr){}
        PyObjectHolder(PyObject *o):ptr(o){
            std::lock_guard<std::mutex> guard(ref_mutex);
            Py_XINCREF(ptr);
        }
        //rule of 3
        ~PyObjectHolder(){
            std::lock_guard<std::mutex> guard(ref_mutex);
            Py_XDECREF(ptr);
        }
        PyObjectHolder(const PyObjectHolder &h):
            PyObjectHolder(h.ptr){}
        PyObjectHolder& operator=(const PyObjectHolder &other){
            {
                std::lock_guard<std::mutex> guard(ref_mutex);
                Py_XDECREF(ptr);
                ptr=other.ptr;
                Py_XINCREF(ptr);
            }
            return *this;
        }
    };
    """
    cdef cppclass PyObjectHolder:
        PyObjectHolder(PyObject *o) nogil
    ...

And now, running the code snipped from above yields the expected/right behavior:

import sys
a=[1000]*1000
print("Starts with", sys.getrefcount(a[0]))
# prints: Starts with 1002
run(a[0])
print("Ends with", sys.getrefcount(a[0]))
#prints: Ends with 1002

However, as @DavidW has pointed out, using std::mutex works only for openmp-threads, but not threads created by the Python-interpreter.

Here is an example for which the mutex-solution will fail.

First, wrapping nogil-function as def-function:

%%cython --cplus -c=/openmp 
...
def single_create_vectors(object o):
    cdef PyObject *ptr=<PyObject *>o
    with nogil:
         create_vectors(ptr)

And now using threading-module to create

import sys
a=[1000]*10000  # some safety, so chances are high python will not crash 
print(sys.getrefcount(a[0]))  
#output: 10002  

from threading import Thread
threads = []
for i in range(100):
    t = Thread(target=single_create_vectors, args=(a[0],))
    threads.append(t)
    t.start()
for t in threads:
    t.join()

print(sys.getrefcount(a[0]))
#output: 10015   but should be 10002!

An alternative to using std::mutex would be to use the Python-machinery, i.e. PyGILState_STATE, which would lead to code similar to

...
PyObjectHolderPy(PyObject *o):ptr(o){
    PyGILState_STATE gstate;
    gstate = PyGILState_Ensure();
    Py_XINCREF(ptr);
    PyGILState_Release(gstate);
}
...

This would also work for the threading-example above. However, PyGILState_Ensure has just too much overhead - for the example above it would be about 100 times slower than the mutex-solution. One more lightweight solution with Python-machinery would mean also much more hassle.

Listing complete thread-unsafe version:

%%cython --cplus -c=/openmp 

from libcpp.vector cimport vector
from libc.stdio cimport printf
from cpython cimport PyObject  
from cython.parallel import prange

import sys

cdef extern from *:
    """
    #include <Python.h>

    class PyObjectHolder{
    public:
        PyObject *ptr;
        PyObjectHolder():ptr(nullptr){}
        PyObjectHolder(PyObject *o):ptr(o){
            Py_XINCREF(ptr);
        }
        //rule of 3
        ~PyObjectHolder(){
            Py_XDECREF(ptr);
        }
        PyObjectHolder(const PyObjectHolder &h):
            PyObjectHolder(h.ptr){}
        PyObjectHolder& operator=(const PyObjectHolder &other){
            {
                Py_XDECREF(ptr);
                ptr=other.ptr;
                Py_XINCREF(ptr);
            }
            return *this;
        }
    };
    """
    cdef cppclass PyObjectHolder:
        PyObjectHolder(PyObject *o) nogil


cdef int create_vectors(PyObject *o) nogil:
    cdef vector[PyObjectHolder] vec
    cdef int i
    for i in range(100):
        vec.push_back(PyObjectHolder(o))
    return vec.size()

def run(object o):
    cdef PyObject *ptr=<PyObject*>o;
    cdef int i
    for i in prange(10, nogil=True):
        create_vectors(ptr)

回答2:

The fact that your objects end up at the same address is coincidence. Your issue is that your python objects get destroyed when the last python reference to them goes away. If you want to keep python objects alive, you will need to hold a reference to them somewhere.

In your case, since tmp is the only reference to the Temp object you create within your loop, every time you re-assign tmp, the object it was previously referencing gets destroyed. That leaves blank space in memory that's conveniently exactly the right size to hold the Temp object that gets created in the next iteration of the loop, leading to the alternating pattern you see in your pointers.