I am trying to make calculations in Cython that rely heavily on some numpy/scipy mathematical functions like numpy.log
. I noticed that if I call numpy/scipy functions repeatedly in a loop in Cython, there are huge overhead costs, e.g.:
import numpy as np
cimport numpy as np
np.import_array()
cimport cython
def myloop(int num_elts):
cdef double value = 0
for n in xrange(num_elts):
# call numpy function
value = np.log(2)
This is very expensive, presumably because np.log
goes through Python rather than calling the numpy C function directly. If I replace that line with:
from libc.math cimport log
...
# calling libc function 'log'
value = log(2)
then it's much faster. However, when I try to pass a numpy array to libc.math.log:
cdef np.ndarray[long, ndim=1] foo = np.array([1, 2, 3])
log(foo)
it gives this error:
TypeError: only length-1 arrays can be converted to Python scalars
My questions are:
- Is it possible to call the C function and pass it a numpy array? Or can it only be used on scalar values, which would require me to write a loop (eg if I wanted to apply it to the
foo
array above.) - Is there an analogous way to call scipy functions from C directly without a Python overhead? Which how can I import scipy's C function library?
Concrete example: say you want to call many of scipy's or numpy's useful statistics functions (e.g. scipy.stats.*
) on scalar values inside a for
loop in Cython? It's crazy to reimplement all those functions in Cython, so their C versions have to be called. For example, all the functions related to pdf/cdf and sampling from various statistical distributions (e.g. see http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.pdf.html#scipy.stats.rv_continuous.pdf and http://www.johndcook.com/distributions_scipy.html) If you call these functions with Python overhead in a loop, it'll be prohibitively slow.
thanks.