How to call numpy/scipy C functions from Cython di

2020-07-02 09:15发布

问题:

I am trying to make calculations in Cython that rely heavily on some numpy/scipy mathematical functions like numpy.log. I noticed that if I call numpy/scipy functions repeatedly in a loop in Cython, there are huge overhead costs, e.g.:

import numpy as np
cimport numpy as np
np.import_array()
cimport cython

def myloop(int num_elts):
   cdef double value = 0
   for n in xrange(num_elts):
     # call numpy function
     value = np.log(2)

This is very expensive, presumably because np.log goes through Python rather than calling the numpy C function directly. If I replace that line with:

from libc.math cimport log
...
# calling libc function 'log'
value = log(2)

then it's much faster. However, when I try to pass a numpy array to libc.math.log:

cdef np.ndarray[long, ndim=1] foo = np.array([1, 2, 3])
log(foo)

it gives this error:

TypeError: only length-1 arrays can be converted to Python scalars

My questions are:

  1. Is it possible to call the C function and pass it a numpy array? Or can it only be used on scalar values, which would require me to write a loop (eg if I wanted to apply it to the foo array above.)
  2. Is there an analogous way to call scipy functions from C directly without a Python overhead? Which how can I import scipy's C function library?

Concrete example: say you want to call many of scipy's or numpy's useful statistics functions (e.g. scipy.stats.*) on scalar values inside a for loop in Cython? It's crazy to reimplement all those functions in Cython, so their C versions have to be called. For example, all the functions related to pdf/cdf and sampling from various statistical distributions (e.g. see http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.pdf.html#scipy.stats.rv_continuous.pdf and http://www.johndcook.com/distributions_scipy.html) If you call these functions with Python overhead in a loop, it'll be prohibitively slow.

thanks.

回答1:

You cannot apply C functions such as log on numpy arrays, and numpy does not have a C function library that you can call from cython.

Numpy functions are already optimized to be called on numpy arrays. Unless you have a very unique use case, you're not going to see much benefit from reimplementing numpy functions as C functions. (It's possible that some functions in numpy are not implemented well, in chich case consider submitting your importations as patches.) However you do bring up a good point.

# A
from libc.math cimport log
for i in range(N):
    r[i] = log(foo[i])

# B
r = np.log(foo)

# C
for i in range(n):
    r[i] = np.log(foo[i])

In general, A and B should have similar run times, but C should be avoided and will be much slower.

Update

Here is the code for scipy.stats.norm.pdf, as you can see it's written in python with numpy and scipy calls. There is no C version of this code, you have to call it "through python". If this is what is holding you back, you'll need to re-implant it in C/Cython, but first I would spend some time very carefully profiling the code to see if there are any lower hanging fruit to go after first.

def pdf(self,x,*args,**kwds):
    loc,scale=map(kwds.get,['loc','scale'])
    args, loc, scale = self._fix_loc_scale(args, loc, scale)
    x,loc,scale = map(asarray,(x,loc,scale))
    args = tuple(map(asarray,args))
    x = asarray((x-loc)*1.0/scale)
    cond0 = self._argcheck(*args) & (scale > 0)
    cond1 = (scale > 0) & (x >= self.a) & (x <= self.b)
    cond = cond0 & cond1
    output = zeros(shape(cond),'d')
    putmask(output,(1-cond0)+np.isnan(x),self.badvalue)
    if any(cond):
        goodargs = argsreduce(cond, *((x,)+args+(scale,)))
        scale, goodargs = goodargs[-1], goodargs[:-1]
        place(output,cond,self._pdf(*goodargs) / scale)
    if output.ndim == 0:
        return output[()]
    return output