I've mangled Cython badly, it's performing

2019-06-24 17:17发布

I'm rather new to Python and absolutely ignorant of C (unfortunately) so I am struggling to properly understand some aspects of working with Cython.

After profiling a Python program and discovering that it was just a couple of loops that were hogging most of the time, I decided to look into dumping them into Cython. Initially, I just let Cython interpret the Python as it was, and the result was a (remarkable!) ~2x speed boost. Cool!

From the Python main, I pass the function two 2-D arrays ("a" and "b") and a float, "d", and it returns a list, "newlist". As examples:

a =[[12.7, 13.5, 1.0],[23.4, 43.1, 1.0],...]
b =[[0.46,0.95,0],[4.56,0.92,0],...]
d = 0.1

Here is the original code, with just the cdefs added for Cython:

def loop(a, b, d):

    cdef int i, j
    cdef double x, y

    newlist = []

    for i in range(len(a)):
        if b[i][2] != 1:
            for j in range(i+1,len(a)):
                if a[i] == a[j] and b[j][2] != 1:
                    x = b[i][0]+b[j][0]
                    y = b[i][1]+b[j][1]
                    b[i][2],b[j][2] = 1,1

                    if abs(y)/abs(x) > d:
                        if y > 0: newlist.append([a[i][0],a[i][1],y])

    return newlist

In "pure Python", this ran (with several ten-thousand loops) in ~12.5s. In Cython it ran in ~6.3s. Great progress with near-zero work done!

However, with a little reading, it was clear that much, much more could be done, so I set about trying to apply some type changes to get things going even faster, following the Cython docs, here (also referenced in the comments).

Here are the collected modifications, meant to mimic the Cython docs:

import numpy as np
cimport numpy as np

DtypeA = np.float
DtypeB = np.int

ctypedef np.float_t DtypeA_t
ctypedef np.int_t DtypeB_t

def loop(np.ndarray[DtypeA_t, ndim=2] A,
         np.ndarray[DtypeA_t, ndim=2] B,
         np.ndarray[DtypeB_t, ndim=1] C,
         float D):

    cdef Py_ssize_t i, j
    cdef float x, y

    cdef np.ndarray[DtypeA_t, ndim=2] NEW_ARRAY = np.zeros((len(C),3), dtype=DtypeA)

    for i in range(len(C)):
        if C[i] != 1:
            for j in range(i+1,len(C)):
                if A[i][0]==A[j][0] and A[i][1]==A[j][1] and C[j]!= 1:
                    x = B[i][0]+B[j][0]
                    y = B[i][1]+B[j][1]
                    C[i],C[j] = 1,1

                    if abs(y)/abs(x) > D:
                        if y > 0: NEW_ARRAY[i]=([A[i][0],A[i][1],y])

    return NEW_ARRAY

Among other things, I split the previous array "b" into two different input arrays "B" and "C", because each row of "b" contained 2 float elements and an integer that just acted as a flag. So I removed the flag integers and put them in a separate 1-D array, "C". So, the inputs now looked liked this:

A =[[12.7, 13.5, 1.0],[23.4, 43.1, 1.0],...]
B =[[0.46,0.95],[4.56,0.92],...]
C =[0,0,...]
D = 0.1

Ideally, this should go much faster with all the variables now being typed(?)...but obviously I'm doing something very wrong, because the function now comes in at a time of 35.3s...way WORSE than the "pure Python"!!

What am I botching so badly? Thanks for reading!

3条回答
家丑人穷心不美
2楼-- · 2019-06-24 17:32

Have you compiled manually with cython -a and inspected the line-by-line annotation? (If so, if you could post an image of it, or write-up something about what it tells you, that will help us). Lines highlighted with yellow indicate lines where the source to source transpiler output results in heavy use of the CPython API.

For example, you do cdef Py_ssize_t i, j but there's no reason why these can't be C integers. It takes overhead to treat them as Py_ssize_t and if they are merely used as an index in a simple loop with bounds you can easily guarantee, there's no need. I don't bring up Py_ssize_t to try to say don't use it generally. If your case involves needing to support 64-bit architectures or higher precision integers for the index, then of course use it. I'm just mentioning it because little things like this can sometimes have an unexpected and big impact on when/why a bunch of CPython API stuff gets roped into some Cython code that you thought would be free of the CPython API. Perhaps a better example in your code is the use of the construction of Python bool values and and inside the loop, instead of vectorized or C-level versions of these.

Such locations in Cython generally refer to locations where you won't get a speedup and often get a slowdown, especially if they are mixed in with NumPy code that, due to its more tightly optimized use of Cython / extension wrappers, wouldn't have otherwise had to deal with that CPython API overhead. You can let the cython -a output guide you in terms of adding C-level type declarations to replace Python types, or to use C-level functions like from the C math library, instead of Python operations that may require treating arguments, even when typed, as potentially being any sort of Python object, with all the many attribute lookups and dispatching calls that this involves.

查看更多
ら.Afraid
3楼-- · 2019-06-24 17:34

I believe the use of the indexing notation b[j][0] may be throwing Cython off, making it impossible for it to use fast indexing operations behind the scenes. Incidentally, even in pure Python code this style is not idiomatic and may lead to slower code.

Try instead to use the notation b[j,0] throughout and see if it improves your performance.

查看更多
该账号已被封号
4楼-- · 2019-06-24 17:40

Showing annotations with cython -a is indeed very usefull to optimize Cython code. Here is a version that should be much faster and that uses a cleaner syntax with memoryviews,

# cython: boundscheck=False
# cython: cdivision=True
# cython: wraparound=False

import numpy as np
cimport numpy as np


def loop(double [:,::1] A, double [:,::1] B, long [::1] C, float D):

    cdef Py_ssize_t i, j, N
    cdef float x, y

    N = len(C)

    cdef double[:,::1] NEW_ARRAY = np.zeros((N,3), dtype='float64')

    for i in range(N):
        if C[i] != 1:
            for j in range(i+1, N):
                if (A[i,0]==A[j,0]) & (A[i,1]==A[j,1]) & (C[j] != 1):
                    x = B[i,0] + B[j,0]
                    y = B[i,1] + B[j,1]
                    C[i] = 1
                    C[j] = 1

                    if abs(y/x) > D and y >0:
                        NEW_ARRAY[i,0] = A[i,0]
                        NEW_ARRAY[i,1] = A[i,1]
                        NEW_ARRAY[i,2] = y

    return NEW_ARRAY.base
查看更多
登录 后发表回答