Since I found memory-views handy and fast, I try to avoid creating NumPy arrays in cython and work with the views of the given arrays. However, sometimes it cannot be avoided, not to alter an existing array but create a new one. In upper functions this is not noticeable, but in often called subroutines it is. Consider the following function
#@cython.profile(False)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cdef double [:] vec_eq(double [:] v1, int [:] v2, int cond):
''' Function output corresponds to v1[v2 == cond]'''
cdef unsigned int n = v1.shape[0]
cdef unsigned int n_ = 0
# Size of array to create
cdef size_t i
for i in range(n):
if v2[i] == cond:
n_ += 1
# Create array for selection
cdef double [:] s = np.empty(n_, dtype=np_float) # Slow line
# Copy selection to new array
n_ = 0
for i in range(n):
if v2[i] == cond:
s[n_] = v1[i]
n_ += 1
return s
Profiling tells me, there is some speed to gain here
http://uppix.com/f-sil252cdaca8001511f6.png
What I could do is adapting the function, cause sometimes, for instance the mean of this vector is calculated, sometimes the sum. So I could rewrite it, for summing or taking average. But isn't there a way to create memory-view with very little overhead directly, defining size dynamically. Something like first creating a c buffer using malloc
etc and at the end of the function convert the buffer to a view, passing the pointer and strides or so..
Edit 1: Maybe for simple cases, adapting the function e. g. like this is an acceptable approach. I only added an argument and summing/taking average. This way I dont have to create an array and can take an easy to handle inside function malloc. This won't get any faster, will it?
# ...
cdef double vec_eq(double [:] v1, int [:] v2, int cond, opt=0):
# additional option argument
''' Function output corresponds to v1[v2 == cond].sum() / .mean()'''
cdef unsigned int n = v1.shape[0]
cdef int n_ = 0
# Size of array to create
cdef Py_ssize_t i
for i in prange(n, nogil=True):
if v2[i] == cond:
n_ += 1
# Create array for selection
cdef double s = 0
cdef double * v3 = <double *> malloc(sizeof(double) * n_)
if v3 == NULL:
abort()
# Copy selection to new array
n_ = 0
for i in range(n):
if v2[i] == cond:
v3[n_] = v1[i]
n_ += 1
# Do further computation here, according to option
# Option 0 for the sum
if opt == 0:
for i in prange(n_, nogil=True):
s += v3[i]
free(v3)
return s
# Option 1 for the mean
else:
for i in prange(n_, nogil=True):
s += v3[i]
free(v3)
return s / n_
# Since in the end there is always only a single double value,
# the memory can be freed right here