I'm a relatively experienced Python programmer, but haven't written any C in a very long time and am attempting to understand Cython. I'm trying to write a Cython function that will operate on a column of a NumPy recarray.
The code I have so far is below.
recarray_func.pyx:
import numpy as np
cimport numpy as np
cdef packed struct rec_cell0:
np.float32_t f0
np.int64_t i0, i1, i2
def sum(np.ndarray[rec_cell0, ndim=1] recarray):
cdef Py_ssize_t i
cdef rec_cell0 *cell
cdef np.float32_t running_sum = 0
for i in range(recarray.shape[0]):
cell = &recarray[i]
running_sum += cell.f0
return running_sum
At the interpreter prompt:
array = np.recarray((100, ), names=['f0', 'i0', 'i1', 'i2'],
formats=['f4', 'i8', 'i8', 'i8'])
recarray_func.sum(array)
This simply sums the f0
column of the recarray. It compiles and runs without a problem.
My question is, how would I modify this so that it can operate on any column? In the example above, the column to sum is hard coded and accessed through dot notation. Is it possible to change the function so the column to sum is passed in as a parameter?
I believe this should be possible using Cython's memoryviews. Something along these lines should work (code not tested):
import numpy as np
cimport numpy as np
cdef packed struct rec_cell0:
np.float32_t f0
np.int64_t i0, i1, i2
def sum(rec_cell0[:] recview):
cdef Py_ssize_t i
cdef np.float32_t running_sum = 0
for i in range(recview.shape[0]):
running_sum += recview[i].f0
return running_sum
Speed can probably be increased by ensuring that the record array you pass to Cython is contiguous. On the python (calling) side, you can use np.require
, while the function signature should change to rec_cell0[::1] recview
to indicate that the array can be assumed to be contiguous. And as always, once the code has been tested, turning off the boundscheck
, wraparound
and nonecheck
compiler directives in Cython will likely further improve speed.
What you want requires weak typing, which C doesn't have. If all your record types are the same you might be able to pull off something like: (disclaimer I don't have Cython on this machine so I am coding blind).
import numpy as np
cimport numpy as np
cdef packed struct rec_cell0:
np.float32_t f0
np.int64_t i0, i1, i2
def sum(np.ndarray[rec_cell0, ndim=1] recarray, colname):
cdef Py_ssize_t i
cdef rec_cell0 *cell
cdef np.float32_t running_sum = 0
loc = recarray.dtype.fields[colname][1]
for i in range(recarray.shape[0]):
cell = &recarray[i]
running_sum += *(int *)(&cell+loc);
return running_sum