Since I found memory-views handy and fast, I try to avoid creating NumPy arrays in cython and work with the views of the given arrays. However, sometimes it cannot be avoided, not to alter an existing array but create a new one. In upper functions this is not noticeable, but in often called subroutines it is. Consider the following function
#@cython.profile(False)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cdef double [:] vec_eq(double [:] v1, int [:] v2, int cond):
''' Function output corresponds to v1[v2 == cond]'''
cdef unsigned int n = v1.shape[0]
cdef unsigned int n_ = 0
# Size of array to create
cdef size_t i
for i in range(n):
if v2[i] == cond:
n_ += 1
# Create array for selection
cdef double [:] s = np.empty(n_, dtype=np_float) # Slow line
# Copy selection to new array
n_ = 0
for i in range(n):
if v2[i] == cond:
s[n_] = v1[i]
n_ += 1
return s
Profiling tells me, there is some speed to gain here
http://uppix.com/f-sil252cdaca8001511f6.png
What I could do is adapting the function, cause sometimes, for instance the mean of this vector is calculated, sometimes the sum. So I could rewrite it, for summing or taking average. But isn't there a way to create memory-view with very little overhead directly, defining size dynamically. Something like first creating a c buffer using malloc
etc and at the end of the function convert the buffer to a view, passing the pointer and strides or so..
Edit 1: Maybe for simple cases, adapting the function e. g. like this is an acceptable approach. I only added an argument and summing/taking average. This way I dont have to create an array and can take an easy to handle inside function malloc. This won't get any faster, will it?
# ...
cdef double vec_eq(double [:] v1, int [:] v2, int cond, opt=0):
# additional option argument
''' Function output corresponds to v1[v2 == cond].sum() / .mean()'''
cdef unsigned int n = v1.shape[0]
cdef int n_ = 0
# Size of array to create
cdef Py_ssize_t i
for i in prange(n, nogil=True):
if v2[i] == cond:
n_ += 1
# Create array for selection
cdef double s = 0
cdef double * v3 = <double *> malloc(sizeof(double) * n_)
if v3 == NULL:
abort()
# Copy selection to new array
n_ = 0
for i in range(n):
if v2[i] == cond:
v3[n_] = v1[i]
n_ += 1
# Do further computation here, according to option
# Option 0 for the sum
if opt == 0:
for i in prange(n_, nogil=True):
s += v3[i]
free(v3)
return s
# Option 1 for the mean
else:
for i in prange(n_, nogil=True):
s += v3[i]
free(v3)
return s / n_
# Since in the end there is always only a single double value,
# the memory can be freed right here
From http://docs.cython.org/src/userguide/memoryviews.html it follows that memory for cython memory views can be allocated via:
or by
Both case works, but in first i have an issues if introduce own type (ctypedef some mytype) because there is no suitable format for it. In second case there is problem with deallocation of memory.
From manual it should work as follows:
which bind function which free memory to the memoryview, however this code does not compile.
The following thread on the Cython mailing list would probably be of interest to you:
https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM
It looks like there are some decent options presented if you are fine with returning a memoryview from your function that gets coerced at some different level where perfomance isn't as much of an issue.
Didn't know, how to deal with cpython arrays, so I solved this finally by a self made 'memory view', as proposed by fabrizioM. Wouldn't have thought that this would work. Creating a new np.array in a tight loop is pretty expensive, so this gave me a significant speed up. Since I only need a 1 dimensional array, I didn't even had to bother with strides. But even for a higher dimensional arrays, I think this could go well.