Problem
Void pointer to a cdef class is pointing to the same memory address without forcing the the python reference counter.
Description
I have a simple class that I want to store in a cpp vector by casting it to a void pointer. However, after printing the memory addresses the pointer is pointing to, it repeats after the second iteration, unless I force the reference counter to be increased by adding the new object to a list. Can somebody why the memory loops back without the reference counter enforcement?
# distutils: language = c++
# distutils: extra_compile_args = -std=c++11
from libcpp.vector cimport vector
from libc.stdio cimport printf
cdef class Temp:
cdef int a
def __init__(self, a):
self.a = a
def f():
cdef vector[void *] vec
cdef int i, n = 3
cdef Temp tmp
cdef list ids = []
# cdef list classes = [] # force reference counter?
for i in range(n):
tmp = Temp(1)
# classes.append(tmp)
vec.push_back(<void *> tmp)
printf('%p ', <void *> tmp)
ids.append(id(tmp))
print(ids)
f()
Which outputs:
[140137023037824, 140137023037848, 140137023037824]
However if I force the reference counter by adding it to the classes list:
[140663518040448, 140663518040472, 140663518040496]
This answer became quite long, so there is a quick overview of the content:
Explanation of the observed behavior
The deal with Cython: as long as your variables are of type
object
or inherit from it (in your casecdef Temp
) cython manages the reference counting for you. As soon as you cast it toPyObject *
or any other pointer - the reference counting is your responsibility.Obviously, the only reference to the created object is the variable
tmp
, as soon as you rebind it to the newly createdTemp
-object, the reference-counter of the old object becomes0
and it is destroyed - the pointers in the vector becomes dangling. However, the same memory can be reused (it is quite probably) and thus you see always the same reused address.Naive solution
How could you do the reference counting? For example (I use rather
PyObject *
thanvoid *
):Now all objects stay alive and "die" only after
Py_XDECREF
is called explicitly.C++-typical solution
The above is not a very typical c++-way of doing things, I would rather introduce a wrapper which manages the reference counting automatically (not unlike
std::shared_ptr
):Noteworthy things:
PyObjectHolder
increases ref-counter as soon as it take possession of aPyObject
-pointer and decreases it as soon as it releases the pointer.Problems with nogil-mode
There is however one very important thing: You shouldn't release GIL with the above implementation (i.e. import it as
PyObjectHolder(PyObject *o) nogil
but there are also problems when C++ copies the vectors and similar) - because otherwisePy_XINCREF
andPy_XDECREF
might not work correctly.To illustrate that let's take a look at the following code, which releases gil and does some stupid calculations in parallel (the whole magic cell is in listings at the end of the answer):
And now:
We got lucky, the program didn't crash (but could!). However due to race conditions, we ended up with memory leak -
a[0]
has reference count of1177
but there are only 1000 references(+2 inside ofsys.getrefcount
) references alive, so this object will never be destroyed.Making
PyObjectHolder
thread-safeSo what to do? The simplest solution is to use a mutex to protect the accesses to ref-counter(i.e. every time
Py_XINCREF
orPy_XDECREF
is called). The downside of this approach is that it might slowdown the single core code considerable (see for example this old article about an older try to replace GIL by mutex-similar approach).Here is a prototype:
And now, running the code snipped from above yields the expected/right behavior:
However, as @DavidW has pointed out, using
std::mutex
works only for openmp-threads, but not threads created by the Python-interpreter.Here is an example for which the mutex-solution will fail.
First, wrapping nogil-function as
def
-function:And now using
threading
-module to createAn alternative to using
std::mutex
would be to use the Python-machinery, i.e.PyGILState_STATE
, which would lead to code similar toThis would also work for the
threading
-example above. However,PyGILState_Ensure
has just too much overhead - for the example above it would be about 100 times slower than the mutex-solution. One more lightweight solution with Python-machinery would mean also much more hassle.Listing complete thread-unsafe version:
The fact that your objects end up at the same address is coincidence. Your issue is that your python objects get destroyed when the last python reference to them goes away. If you want to keep python objects alive, you will need to hold a reference to them somewhere.
In your case, since
tmp
is the only reference to theTemp
object you create within your loop, every time you re-assigntmp
, the object it was previously referencing gets destroyed. That leaves blank space in memory that's conveniently exactly the right size to hold theTemp
object that gets created in the next iteration of the loop, leading to the alternating pattern you see in your pointers.