Can I force a numpy ndarray to take ownership of i

2019-02-04 17:54发布

问题:

I have a C function that mallocs() and populates a 2D array of floats. It "returns" that address and the size of the array. The signature is

int get_array_c(float** addr, int* nrows, int* ncols);

I want to call it from Python, so I use ctypes.

import ctypes
mylib = ctypes.cdll.LoadLibrary('mylib.so')
get_array_c = mylib.get_array_c

I never figured out how to specify argument types with ctypes. I tend to just write a python wrapper for each C function I'm using, and make sure I get the types right in the wrapper. The array of floats is a matrix in column-major order, and I'd like to get it as a numpy.ndarray. But its pretty big, so I want to use the memory allocated by the C function, not copy it. (I just found this PyBuffer_FromMemory stuff in this StackOverflow answer: https://stackoverflow.com/a/4355701/3691)

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object

import numpy
def get_array_py():
    nrows = ctypes.c_int()
    ncols = ctypes.c_int()
    addr_ptr = ctypes.POINTER(ctypes.c_float)()
    get_array_c(ctypes.byref(addr_ptr), ctypes.byref(nrows), ctypes.byref(ncols))
    buf = buffer_from_memory(addr_ptr, 4 * nrows * ncols)
    return numpy.ndarray((nrows, ncols), dtype=numpy.float32, order='F',
                         buffer=buf)

This seems to give me an array with the right values. But I'm pretty sure it's a memory leak.

>>> a = get_array_py()
>>> a.flags.owndata
False

The array doesn't own the memory. Fair enough; by default, when the array is created from a buffer, it shouldn't. But in this case it should. When the numpy array is deleted, I'd really like python to free the buffer memory for me. It seems like if I could force owndata to True, that should do it, but owndata isn't settable.

Unsatisfactory solutions:

  1. Make the caller of get_array_py() responsible for freeing the memory. That's super annoying; the caller should be able to treat this numpy array just like any other numpy array.

  2. Copy the original array into a new numpy array (with its own, separate memory) in get_array_py, delete the first array, and free the memory inside get_array_py(). Return the copy instead of the original array. This is annoying because it's an ought-to-be unnecessary memory copy.

Is there a way to do what I want? I can't modify the C function itself, although I could add another C function to the library if that's helpful.

回答1:

I just stumbled upon this question, which is still an issue in August 2013. Numpy is really picky about the OWNDATA flag: There is no way it can be modified on the Python level, so ctypes will most likely not be able to do this. On the numpy C-API level - and now we are talking about a completely different way of making Python extension modules - one has to explicitly set the flag with:

PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);

On numpy < 1.7, one had to be even more explicit:

((PyArrayObject*)arr)->flags |= NPY_OWNDATA;

If one has any control over the underlying C function/library, the best solution is to pass it an empty numpy array of the appropriate size from Python to store the result in. The basic principle is that memory allocation should always be done on the highest level possible, in this case on the level of the Python interpreter.


As kynan commented below, if you use Cython, you have to expose the function PyArray_ENABLEFLAGS manually, see this post Force NumPy ndarray to take ownership of its memory in Cython.

The relevant documentation is here and here.



回答2:

I would tend to have two functions exported from my C library:

int get_array_c_nomalloc(float* addr, int nrows, int ncols); /* Pass addr as argument */
int get_array_c(float **addr, int nrows, int ncols); /* Calls function above */

I would then write my Python wrapper[1] of get_array_c to allocate the array, then call get_array_c_nomalloc. Then Python does own the memory. You could integrate this wrapper into your library so your user never has to be aware of get_array_c_nomalloc's existence.

[1] This isn't really a wrapper anymore, but instead is an adapter.