return 2D array created from a C function into Pyt

2019-07-28 18:44发布

问题:

I want to use a 2D array created by a c function in python. I asked how to do this before today and one approach suggested by @Abhijit Pritam was to use structs. I implemented it and it does work.

c code:

typedef struct {
  int arr[3][5];
} Array;

Array make_array_struct() {
  Array my_array;
  int count = 0;
  for (int i = 0; i < 3; i++)
    for (int j = 0; j  < 5; j++)
      my_array.arr[i][j] = ++count;
  return my_array;
}

in python I have this:

cdef extern from "numpy_fun.h":
    ctypedef struct Array:
        int[3][5] arr
    cdef Array make_array_struct()

def make_array():
    cdef Array arr = make_array_struct()
    return arr

my_arr = make_array()
my_arr['arr']
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]

However it was suggested that this was not the best approach to the problem because it's possible to make python have control over the data. I'm trying to implement this but I haven't been able to that so far. This is what I have.

c code:

int **make_array_ptr() {
  int **my_array = (int **)malloc(3 * sizeof(int *));
  my_array[0] = calloc(3 * 5, sizeof(int));
  for (int i = 1; i < 3; i++)
    my_array[i] = my_array[0] + i * 5;
  int count = 0;
  for (int i = 0; i < 3; i++)
    for (int j = 0; j < 5; j++)
      my_array[i][j] = ++count;
  return my_array;
}

python:

import numpy as np
cimport numpy as np

np.import_array()

ctypedef np.int32_t DTYPE_t

cdef extern from "numpy/arrayobject.h":
    void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)

cdef extern from "numpy_fun.h":
    cdef int **make_array_ptr()

def make_array():
    cdef int[::1] dims = np.array([3, 5], dtype=np.int32)
    cdef DTYPE_t **data = <DTYPE_t **>make_array_ptr()
    cdef np.ndarray[DTYPE_t, ndim=2] my_array = np.PyArray_SimpleNewFromData(2, &dims[0], np.NPY_INT32, data)
    PyArray_ENABLEFLAGS(my_array, np.NPY_OWNDATA)
    return my_array

I was following Force NumPy ndarray to take ownership of its memory in Cython which seems to be what I need to do. In my case is it's different because I need 2D array so I'll likely have to do things a bit differently because for example the function expects data to be a pointer to int and I gave it a pointer to pointer to int. What do I have to do to use this approach?

回答1:

My issues with the struct approach is:

  1. It breaks as soon as you want anything but a fixed size of array, with no real way of fixing it.

  2. It relies on Cython's implicit conversion from structs to dicts. Cython copies the data to a Python list, which isn't terribly efficient. This isn't an issue with the small arrays you have here, but it's silly for larger arrays.


I also don't really recommend 2D arrays as pointers-to-pointers. The way numpy (and most other sensible array libraries) implement 2D arrays is to store a 1D array and the shape of the 2D array, and just use the shape to work out what index to access. This tends to be more efficient (faster lookups, faster allocation) and also easier to use (less allocation/deallocation to keep track of).

To do this change the C code to:

int32_t *make_array_ptr() {
  int32_t *my_array = calloc(3 * 5, sizeof(int32_t));
  int count = 0;
  for (int i = 0; i < 3; i++)
    for (int j = 0; j < 5; j++)
      my_array[j+i*5] = ++count;
  return my_array;
}

I've deleted the first loop that you immediately overwrite. I've also changed the type of int32_t since you seem to rely on this in your Cython code later.

The Cython code is then very close to what you were using:

def make_array():
    cdef np.intp_t dims[2] 
    dims[0]=3; dims[1] = 5
    cdef np.int32_t *data = make_array_ptr()
    cdef np.ndarray[np.int32_t, ndim=2] my_array = np.PyArray_SimpleNewFromData(2, &dims[0], np.NPY_INT32, data)
    PyArray_ENABLEFLAGS(my_array, np.NPY_OWNDATA)
    return my_array

The main changes are that I've removed some casts and also just allocated dims as a static array (which seemed simpler than memoryviews)


I don't think it's particularly easy allow numpy to handle a pointer-to-pointer array. It might be possible by implementing the Python buffer interface but that that seems like a lot of work and may not be easy.



标签: python c cython