Returning a dictionary of ndarray causes memory le

2019-08-06 12:15发布

问题:

I am writing a c++ module for python. It takes a image, does some processing and returns a dictionary of images. I am having memory leaks which I can't figure out why..

I use opencv-ndarray-conversion to convert between cv::Mat and numpy.ndarray

I use Boost.Python to convert c++ code to python module.

I use the following python code to test the c++ module, while running htop to check the memory usage.

import cv2
import this_cpp_module

for i in xrange(100000):
    img = cv2.imread('a_640x480x3_image.png')
    ret = this_cpp_module.func(img)
    #this 'func' is mapping to one of the following c++ functions, using Boost.Python:
    #    func1, func2 or func3.

1, Converting the image does not cause memory leaks

using namespace boost::python;
PyObject * func1(PyObject *image)
{
    NDArrayConverter cvt;
    cv::Mat mat;
    mat = cvt.toMat(image);
    PyObject* ret = cvt.toNDArray(mat);
    return ret;
}

2, Constructing a dictionary and putting the image into it do not cause memory leaks

using namespace boost::python;
dict func2(PyObject *image)
{
    dict pyDict;    
    object objImage(handle<>(borrowed(image)));
    pyDict[std::string("key")] = objImage;    
    return pyDict;
}

3, But combining them causes the memory leaks (around 1MB per loop)

dict func3(PyObject *image)
{
    return func2(func1(image));
}

I cannot figure it out. Everything seems to be correct to me but combining them together just causes this problem.

回答1:

The leak is a result of func3() never properly disposing the temporary owned reference returned by func1(). To resolve this, func3() needs to do one of the following:

  • Explicitly invoke Py_DECREF() on the owned reference returned from func1() before returning from func3().
  • Manage the value returned by func1() with a boost::python::handle, as it will decrement the object's reference count when the handle is destroyed.

For example, func3() could be written as:

boost::python::dict func3(PyObject* image)
{
  // func1() returns an owned reference, so create a handle to keep the
  // object alive for at least as long as the handle remains alive.  The
  // handle will properly dispose of the reference.
  boost::python::handle<> handle(func1(image));
  return func2(handle.get());
}

For details on the original problem, when func1() returns, the returned object has a reference count of 1. Upon returning from func2() and func3(), the object has a reference count of 2. When the dict returned from func3() is destroyed, the object initially returned from func1() will have its reference count decremented by 1, resulting in the leaked object having a reference count of 1.


Here is a complete minimal example based on the original code:

#include <boost/python.hpp>

PyObject* func1(PyObject*)
{
  return PyList_New(0);
}

boost::python::dict func2(PyObject* obj)
{
  namespace python = boost::python;
  python::dict dict;
  python::handle<> handle(python::borrowed(obj));
  dict[std::string("key")] = python::object(handle);
  return dict;
}

boost::python::dict func3(PyObject* obj)
{
  // Fails to properly dispose of the owned reference returned by func1(),
  // resulting in a leak.
  return func2(func1(obj));
}

boost::python::dict func4(PyObject* obj)
{
  // func1() returns an owned reference, so create a handle to keep the
  // object alive for at least as long as the handle remains alive.  The
  // handle will properly dispose of the reference.
  boost::python::handle<> handle(func1(obj));
  return func2(handle.get());
}

BOOST_PYTHON_MODULE(example)
{
  namespace python = boost::python;
  python::def("func1", &func1);
  python::def("func2", &func2);
  python::def("func3", &func3);
  python::def("func4", &func4);
}

Interactive usage:

>>> from sys import getrefcount
>>> import example
>>> x = example.func1(None)
>>> assert(2 == getrefcount(x)) # refs: x and getrefcount
>>> d = example.func2(x)
>>> assert(3 == getrefcount(x)) # refs: x, d["key"], and getrefcount
>>> d = None
>>> assert(2 == getrefcount(x)) # refs: x and getrefcount
>>> d = example.func3(None)
>>> x = d["key"]
>>> assert(4 == getrefcount(x)) # refs: x, d["key"], getrefcount, and one leak
>>> d = None
>>> assert(3 == getrefcount(x)) # refs: x, getrefcount, and one leak
>>> d = example.func4(None)
>>> x = d["key"]
>>> assert(3 == getrefcount(x)) # refs: x, d["key"], and getrefcount
>>> d = None
>>> assert(2 == getrefcount(x)) # refs: x and getrefcount