I am doing some performance test on a variant of the prime numbers generator from http://docs.cython.org/src/tutorial/numpy.html. The below performance measures are with kmax=1000
Pure Python implementation, running in CPython: 0.15s
Pure Python implementation, running in Cython: 0.07s
def primes(kmax):
p = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p.append(n)
k = k + 1
n = n + 1
return p
Pure Python+Numpy implementation, running in CPython: 1.25s
import numpy
def primes(kmax):
p = numpy.empty(kmax, dtype=int)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Cython implementation using int*: 0.003s
from libc.stdlib cimport malloc, free
def primes(int kmax):
cdef int n, k, i
cdef int *p = <int *>malloc(kmax * sizeof(int))
result = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
free(p)
return result
The above performs great but looks horrible, as it holds two copies of the data... so I tried reimplementing it:
Cython + Numpy: 1.01s
import numpy as np
cimport numpy as np
cimport cython
DTYPE = np.int
ctypedef np.int_t DTYPE_t
@cython.boundscheck(False)
def primes(DTYPE_t kmax):
cdef DTYPE_t n, k, i
cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Questions:
- why is the numpy array so incredibly slower than a python list, when running on CPython?
- what did I do wrong in the Cython+Numpy implementation? cython is obviously NOT treating the numpy array as an int[] as it should.
how do I cast a numpy array to a int*? The below doesn't work
cdef numpy.nparray a = numpy.zeros(100, dtype=int) cdef int * p = <int *>a.data
Best syntax I found so far:
Note where I used numpy.int32 instead of int. Anything on the left side of a cdef is a C type (thus int = int32 and float = float32), while anything on the RIGHT side of it (or outside of a cdef) is a python type (int = int64 and float = float64)
Because you didn't fully type it. Use
By using
np.intc
as the dtype, notnp.int
(which is a Clong
). That's(But really, use a memoryview, they're much cleaner and the Cython folks want to get rid of the NumPy array syntax in the long run.)
Using this instead of p in the calculations. reduced the runtime from 580 ms down to 2.8 ms for me. About the exact same runtime as the implementation using *int. And that's about the max you can expect from this.