The Cython documentation on typed memory views list three ways of assigning to a typed memory view:
- from a raw C pointer,
- from a
np.ndarray
and - from a
cython.view.array
.
Assume that I don't have data passed in to my cython function from outside but instead want to allocate memory and return it as a np.ndarray
, which of those options do I chose? Also assume that the size of that buffer is not a compile-time constant i.e. I can't allocate on the stack, but would need to malloc
for option 1.
The 3 options would therefore looke something like this:
from libc.stdlib cimport malloc, free
cimport numpy as np
from cython cimport view
np.import_array()
def memview_malloc(int N):
cdef int * m = <int *>malloc(N * sizeof(int))
cdef int[::1] b = <int[:N]>m
free(<void *>m)
def memview_ndarray(int N):
cdef int[::1] b = np.empty(N, dtype=np.int32)
def memview_cyarray(int N):
cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i")
What is surprising to me is that in all three cases, Cython generates quite a lot of code for the memory allocation, in particular a call to __Pyx_PyObject_to_MemoryviewSlice_dc_int
. This suggests (and I might be wrong here, my insight into the inner workings of Cython are very limited) that it first creates a Python object and then "casts" it into a memory view, which seems unnecessary overhead.
A simple benchmark doesn't reveal reveal much difference between the three methods, with 2. being the fastest by a thin margin.
Which of the three methods is recommended? Or is there a different, better option?
Follow-up question: I want to finally return the result as a np.ndarray
, after having worked with that memory view in the function. Is a typed memory view the best choice or would I rather just use the old buffer interface as below to create an ndarray
in the first place?
cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32)
Look here for an answer.
The basic idea is that you want
cpython.array.array
andcpython.array.clone
(notcython.array.*
):EDIT
It turns out that the benchmarks in that thread were rubbish. Here's my set, with my timings:
Output:
(The reason for the "iterations" benchmark is that some methods have surprisingly different characteristics in this respect.)
In order of initialisation speed:
malloc
: This is a harsh world, but it's fast. If you need to to allocate a lot of things and have unhindered iteration and indexing performance, this has to be it. But normally you're a good bet for...cpython.array raw C type
: Well damn, it's fast. And it's safe. Unfortunately it goes through Python to access its data fields. You can avoid that by using a wonderful trick:which brings it up to the standard speed while removing safety! This makes this a wonderful replacement for
malloc
, being basically a pretty reference-counted version!cpython.array buffer
: Coming in at only three to four times the setup time ofmalloc
, this is looks a wonderful bet. Unfortunately it has significant overhead (albeit small compared to theboundscheck
andwraparound
directives). That means it only really competes against full-safety variants, but it is the fastest of those to initialise. Your choice.cpython.array memoryview
: This is now an order of magnitude slower thanmalloc
to initialise. That's a shame, but it iterates just as fast. This is the standard solution that I would suggest unlessboundscheck
orwraparound
are on (in which casecpython.array buffer
might be a more compelling tradeoff).The rest. The only one worth anything is
numpy
's, due to the many fun methods attached to the objects. That's it, though.As a follow up to Veedrac's answer: be aware using the
memoryview
support ofcpython.array
with python 2.7 appears to lead to memory leaks currently. This seems to be a long-standing issue as it is mentioned on the cython-users mailing list here in a post from November 2012. Running Veedrac's benchmark scrip with Cython version 0.22 with both Python 2.7.6 and Python 2.7.9 leads to a large memory leak on when initialising acpython.array
using either abuffer
ormemoryview
interface. No memory leaks occur when running the script with Python 3.4. I've filed a bug report on this to the Cython developers mailing list.