I am currently trying to optimize the code that I had written in pure Python. This code uses NumPy very heavily as I am working with NumPy arrays. Below you can see the simplest of my classes that I converted to Cython. Which only does a multiplication of two Numpy arrays. Here:
bendingForces = self.matrixPrefactor * membraneHeight
My question is, if and how I can optimize this as, when I look at the C-code that "cython -a" generates has a lot of NumPy-callings, which does not look very efficient.
import numpy as np
cimport numpy as np
ctypedef np.float64_t dtype_t
ctypedef np.complex128_t cplxtype_t
ctypedef Py_ssize_t index_t
cdef class bendingForcesClass( object ):
cdef dtype_t bendingRigidity
cdef np.ndarray matrixPrefactor
cdef np.ndarray bendingForces
def __init__( self, dtype_t bendingRigidity, np.ndarray[dtype_t, ndim=2] waveNumbersNorm ):
self.bendingRigidity = bendingRigidity
self.matrixPrefactor = -self.bendingRigidity * waveNumbersNorm**2
cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) :
cdef np.ndarray bendingForces
bendingForces = self.matrixPrefactor * membraneHeight
return bendingForces
The idea I had was to use two for
loops and iterate over the entries of the arrays. Perhaps I could use the compiler to optimize this with SIMD-operations?! I tried, which I could compile, but it gave strange results and took forever. Here's the code of the substitute function:
cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) :
cdef index_t index1, index2 # corresponds to: cdef Py_ssize_t index1, index2
for index1 in range( self.matrixSize ):
for index2 in range( self.matrixSize ):
self.bendingForces[ index1, index2 ] = self.matrixPrefactor.data[ index1, index2 ] * membraneHeight.data[ index1, index2 ]
return self.bendingForces
This code however, as I said, is really slow and does not function as expected. So what am I doing wrong? What would be the best way to optimize this and remove the NumPy calling operations?