I'm trying to speed up some matrix inversion in a model I'm building, specifically by implementing some linear algebra routines in Cython. I have the code working, but am trying to optimize. Currently my Cython looks like this:
import cython
import numpy as np
cimport numpy as cnp
cimport scipy.linalg.cython_lapack as cython_lapack
cdef int ZERO = 0
ctypedef cnp.float64_t REAL_t
def inverse(mat,identity,pivots,k):
cdef int K = k
cdef REAL_t* mat_pointer = <REAL_t *>cnp.PyArray_DATA(mat)
cdef REAL_t* iden_pointer = <REAL_t *>cnp.PyArray_DATA(identity)
cdef int* piv_pointer = <int *>cnp.PyArray_DATA(pivots)
cython_lapack.dgesv(&K,&K,mat_pointer,&K,piv_pointer,iden_pointer,&K,&ZERO)
Once I compile (as linalg.pyx
), I can run this just fine, but because dgesv
modifies some of the input variables, I have to use a wrapper function and do some copying:
import linalg
import numpy as np
identity = np.eye(K) # K is the dimensionality of the square matrix
def lapack_inverse(a):
b = identity.copy()
linalg2.inverse(a,b,np.zeros(K,dtype=np.intc),K)
return b
For small K, we see pretty solid speed improvements (2-3x):
In [29]: K=10
In [30]: x = np.random.random((K,K))
In [31]: identity = np.eye(K)
In [32]: print np.allclose(np.linalg.inv(x),lapack_inverse(x))
True
In [33]: %timeit np.linalg.inv(x)
10000 loops, best of 3: 28 µs per loop
In [34]: %timeit lapack_inverse(x)
100000 loops, best of 3: 10.3 µs per loop
But for larger K, returns clearly diminish (e.g. for K=200):
In [8]: %timeit np.linalg.inv(x)
100 loops, best of 3: 10.1 ms per loop
In [9]: %timeit lapack_inverse(x)
100 loops, best of 3: 8.1 ms per loop
I'm thinking I might be able to squeeze better performance out of this by defining the identity and pivot arrays within C, rather than doing the copying and passing that I'm currently doing. In other words, I'd like to modify the Cython function so that it only requires that I pass the array to be inverted.
My questions are (a): Will this lead to speed improvements (even small ones)? and, assuming it will, (b) how do I go about it defining the pivot and identity arrays in Cython directly?
There's lots of ways to allocate memory in Cython - essentially you can create any Python object or use the standard C allocation techniques. This previous question lists lots of them (with timings given in the answers) and is almost certainly generally more useful than my answer. It is dealing with a slightly different use-case though.
In your case you have to allocate the identity matrix as a numpy array, since the answer gets written into it and you want to return it from the function (you could obviously allocate it otherwise then copy, but this seems somewhat pointless). I suspect it's best to call eye
rather than to copy it each time.
You can allocate pivot how you like. In the example below I've used the C allocation mechanisms as being the most direct way (and thus hopefully fastest) of allocating large chunks of memory. The downside of this is that you do have to deallocate it with free
.
cimport numpy as cnp
import numpy as np
#cimport scipy.linalg.cython_lapack as cython_lapack
from libc.stdlib cimport malloc, free
ctypedef cnp.float64_t REAL_t
def inverse(mat):
# you should probably check that mat.shape[0]==mat.shape[1]
# and that mat is actually a float64 array
cdef int k = mat.shape[0]
cdef int info = 0
cdef REAL_t* mat_pointer = <REAL_t*>cnp.PyArray_DATA(mat)
# I suspect you should be doing a check that mat_pointer has been assigned
cdef REAL_t* iden_pointer
cdef int* piv_pointer = <int*>malloc(sizeof(int)*k)
# this is uninitialised (the contents are arbitrary)
# but that's OK because it's used as an output
try:
identity = np.eye(k)
iden_pointer = <REAL_t*>cnp.PyArray_DATA(identity)
cython_lapack.dgesv(&k,&k,mat_pointer,&K,
piv_pointer,iden_pointer,&K,&info)
# you should check info to ensure it's worked
return identity
finally:
free(piv_pointer) # the "try ... finally" ensures that this is freed
There's a few places where I've noted you should add checks to ensure everything is valid... this will slow things down but be safer. You could probably also add some Cython type definitions to get a bit of extra speed, but I haven't really investigated that for this example.
Finally - think about why you're using the inverse. If you're just multiplying another matrix by the inverse then it's better (faster and more accurate) to use dgsev
directly with that matrix. There's rarely a good reason to calculate the inverse directly.