加快与用Cython Python代码(Speeding up python code with c

2019-08-04 08:30发布

我有一个函数,基本上只是让很多调用一个简单的定义的hash函数和测试,看看当它找到一个副本。 我需要做大量的用它模拟的所以想它是尽可能快。 我试图用用Cython做到这一点。 目前用Cython代码被调用整数其值在给平方公尺范围为0正常Python列表。

import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls   
def h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) %m    
def floyd(inputx):
    dupefound, nohashcalls = (0,0)
    m = len(inputx)
    loops = int(m*math.log(m))
    for loopno in xrange(loops):
        if (dupefound == 1):
            break
        a = random.randrange(m)
        b = random.randrange(m)
        c = random.randrange(m)
        d = random.randrange(m)
        pos = random.randrange(m)
        value = inputx[pos]
        listofpos = [0] * m
        listofpos[pos] = 1
        setofvalues = set([value])
        cyclelimit = int(math.sqrt(m))
        for j in xrange(cyclelimit):
            pos = h3(a,b, c,d, m, inputx[pos])
            nohashcalls += 1    
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls 

我怎么能转换inputx和listofpos用C型数组,并在C速度访问数组? 是否有任何其他速度提升可以使用吗? setofvalues可以加快?

因此,有有m个东西来比较,50元话费弗洛伊德()= 5000目前发生在我的电脑在30秒左右。

更新:示例代码片段显示弗洛伊德是如何被调用。

m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)

Answer 1:

首先,似乎你必须输入函数的变量。 它的一个很好的例子是在这里。

其次, cython -a ,为“注解”,给你一个真正优秀的突破由用Cython编译器以及如何肮脏的彩色编码指示生成的代码下来:它是(阅读Python API中重型)。 试图优化什么时候这个输出是真的必不可少。

第三,现在著名的页面与numpy的工作解释了如何才能到numpy的阵列数据的快速,C风格的访问。 Unforunately它的冗长和烦人。 我们在不过幸运,因为最近用Cython提供了类型化的内存访问量 ,这两者都是易于使用和真棒 。 你尝试做任何事情之前阅读整个页面。

十多分钟左右,我来到了这一点:

# cython: infer_types=True

# Use the C math library to avoid Python overhead.
from libc cimport math
# For boundscheck below.
import cython
# We're lazy so we'll let Numpy handle our array memory management.
import numpy as np
# You would normally also import the Numpy pxd to get faster access to the Numpy
# API, but it requires some fancier compilation options so I'll leave it out for
# this demo.
# cimport numpy as np

import random

# This is a small function that doesn't need to be exposed to Python at all. Use
# `cdef` instead of `def` and inline it.
cdef inline int h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) % m

# If we want to live fast and dangerously, we tell cython not to check our array
# indices for IndexErrors. This means we CAN overrun our array and crash the
# program or screw up our stack. Use with caution. Profiling suggests that we
# aren't gaining anything in this case so I leave it on for safety.
# @cython.boundscheck(False)
# `cpdef` so that calling this function from another Cython (or C) function can
# skip the Python function call overhead, while still allowing us to use it from
# Python.
cpdef floyd(int[:] inputx):
    # Type the variables in the scope of the function.
    cdef int a,b,c,d, value, cyclelimit
    cdef unsigned int dupefound = 0
    cdef unsigned int nohashcalls = 0
    cdef unsigned int loopno, pos, j

    # `m` has type int because inputx is already a Cython memory view and
    # `infer-types` is on.
    m = inputx.shape[0]

    cdef unsigned int loops = int(m*math.log(m))

    # Again using the memory view, but letting Numpy allocate an array of zeros.
    cdef int[:] listofpos = np.zeros(m, dtype=np.int32)

    # Keep this random sampling out of the loop
    cdef int[:, :] randoms = np.random.randint(0, m, (loops, 5)).astype(np.int32)

    for loopno in range(loops):
        if (dupefound == 1):
            break

        # From our precomputed array
        a = randoms[loopno, 0]
        b = randoms[loopno, 1]
        c = randoms[loopno, 2]
        d = randoms[loopno, 3]
        pos = randoms[loopno, 4]

        value = inputx[pos]

        # Unforunately, Memory View does not support "vectorized" operations
        # like standard Numpy arrays. Otherwise we'd use listofpos *= 0 here.
        for j in range(m):
            listofpos[j] = 0

        listofpos[pos] = 1
        setofvalues = set((value,))
        cyclelimit = int(math.sqrt(m))
        for j in range(cyclelimit):
            pos = h3(a, b, c, d, m, inputx[pos])
            nohashcalls += 1
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

这里没有技巧,没有进行解释docs.cython.org ,这是我学会了他们自己,但有助于看到这一切走到一起。

你原来的代码的最重要的变化是在的意见,但他们都量给用Cython提示有关如何生成不使用Python API代码。

顺便说一句:我真的不知道为什么infer_types是不是默认。 它可以让编译器隐式而不是使用Python类型在可能的C型,这意味着更少的工作适合你。

如果您运行cython -a这一点,你会看到那个叫成Python唯一的线就是您来电random.sample和建筑或增加一个Python集()。

在我的机器,你的原码2.1秒运行一次。 我的版本0.6秒运行一次。

下一步是让random.sample的是循环的,但我会留给你。

我已经编辑我的答案来说明如何预先计算兰特样本。 这带来的时间缩短到0.4秒



Answer 2:

你需要使用这个特定的哈希算法? 为什么不使用内置的类型的字典哈希算法? 例如:

from collections import Counter
cnt = Counter(inputx)
dupes = [k for k, v in cnt.iteritems() if v > 1]


文章来源: Speeding up python code with cython