I have a large number of small linear equation systems that I'd like to solve efficiently using numpy. Basically, given A[:,:,:]
and b[:,:]
, I wish to find x[:,:]
given by A[i,:,:].dot(x[i,:]) = b[i,:]
. So if I didn't care about speed, I could solve this as
for i in range(n):
x[i,:] = np.linalg.solve(A[i,:,:],b[i,:])
But since this involved explicit looping in python, and since A
typically has a shape like (1000000,3,3)
, such a solution would be quite slow. If numpy isn't up to this, I could do this loop in fortran (i.e. using f2py), but I'd prefer to stay in python if possible.
I guess answering yourself is a bit of a faux pas, but this is the fortran solution I have a the moment, i.e. what the other solutions are effectively competing against, both in speed and brevity.
This would be compiled as:
And called from python as
(The
.T
s are needed due to a poor design choice in f2py, which both causes unnecessary copying, inefficient memory access patterns and unnatural looking fortran indexing if the.T
s are left out.)This also avoids a setup.py etc. I have no bone to pick with fortran (as long as strings aren't involved), but I was hoping that numpy might have something short and elegant which could do the same thing.
I think you can do it in one go, with a (3x100000,3x100000) matrix composed of 3x3 blocks around the diagonal.
Not tested :
For those coming back to read this question now, I thought I'd save others time and mention that numpy handles this using broadcasting now.
So, in numpy 1.8.0 and higher, the following can be used to solve N linear equations.
I think you're wrong about the explicit looping being a problem. Usually it's only the innermost loop it's worth optimizing, and I think that holds true here. For example, we can measure the code of the overhead vs the cost of the actual computation:
which gives me
IOW, it's only spending 10% of its time doing anything other than actually solving your problem. Now, I could totally believe that
np.linalg.solve
itself is too slow for you relative to what you could get out of Fortran, and so you want to do something else. That's probably especially true on small problems, come to think of it: IIRC I once found it faster to unroll certain small solutions by hand, although that was a while back.But by itself, it's not true that using an explicit loop on the first index will make the overall solution quite slow. If
np.linalg.solve
is fast enough, the loop won't change it much here.