Solve large number of small equation systems in nu

2019-06-26 07:56发布

I have a large number of small linear equation systems that I'd like to solve efficiently using numpy. Basically, given A[:,:,:] and b[:,:], I wish to find x[:,:] given by A[i,:,:].dot(x[i,:]) = b[i,:]. So if I didn't care about speed, I could solve this as

for i in range(n):
    x[i,:] = np.linalg.solve(A[i,:,:],b[i,:])

But since this involved explicit looping in python, and since A typically has a shape like (1000000,3,3), such a solution would be quite slow. If numpy isn't up to this, I could do this loop in fortran (i.e. using f2py), but I'd prefer to stay in python if possible.

4条回答
爷、活的狠高调
2楼-- · 2019-06-26 08:11

I guess answering yourself is a bit of a faux pas, but this is the fortran solution I have a the moment, i.e. what the other solutions are effectively competing against, both in speed and brevity.

function pixsolve(A, b) result(x)
    implicit none
    real*8    :: A(:,:,:), b(:,:), x(size(b,1),size(b,2))
    integer*4 :: i, n, m, piv(size(b,1)), err
    n = size(A,3); m = size(A,1)
    x = b
    do i = 1, n
        call dgesv(m, 1, A(:,:,i), m, piv, x(:,i), m, err)
    end do
end function

This would be compiled as:

f2py -c -m foo{,.f90} -llapack -lblas

And called from python as

x = foo.pixsolve(A.T, b.T).T

(The .Ts are needed due to a poor design choice in f2py, which both causes unnecessary copying, inefficient memory access patterns and unnatural looking fortran indexing if the .Ts are left out.)

This also avoids a setup.py etc. I have no bone to pick with fortran (as long as strings aren't involved), but I was hoping that numpy might have something short and elegant which could do the same thing.

查看更多
放荡不羁爱自由
3楼-- · 2019-06-26 08:11

I think you can do it in one go, with a (3x100000,3x100000) matrix composed of 3x3 blocks around the diagonal.

Not tested :

b_new = np.vstack([ b[i,:] for i in range(len(i)) ])
x_new = np.zeros(shape=(3x10000,3) )

A_new = np.zeros(shape=(3x10000,3x10000) )
n,m = A.shape
for i in range(n):
   A_new[3*i:3*(i+1),3*i:3*(i+1)] = A[i,:,:]

x = np.linalg.solve(A_new,b_new)
查看更多
再贱就再见
4楼-- · 2019-06-26 08:22

For those coming back to read this question now, I thought I'd save others time and mention that numpy handles this using broadcasting now.

So, in numpy 1.8.0 and higher, the following can be used to solve N linear equations.

x = np.linalg.solve(A,b)
查看更多
时光不老,我们不散
5楼-- · 2019-06-26 08:30

I think you're wrong about the explicit looping being a problem. Usually it's only the innermost loop it's worth optimizing, and I think that holds true here. For example, we can measure the code of the overhead vs the cost of the actual computation:

import numpy as np

n = 10**6
A = np.random.random(size=(n, 3, 3))
b = np.random.random(size=(n, 3))
x = b*0

def f():
    for i in xrange(n):
        x[i,:] = np.linalg.solve(A[i,:,:],b[i,:])

np.linalg.pseudosolve = lambda a,b: b

def g():
    for i in xrange(n):
        x[i,:] = np.linalg.pseudosolve(A[i,:,:],b[i,:])

which gives me

In [66]: time f()
CPU times: user 54.83 s, sys: 0.12 s, total: 54.94 s
Wall time: 55.62 s

In [67]: time g()
CPU times: user 5.37 s, sys: 0.01 s, total: 5.38 s
Wall time: 5.40 s

IOW, it's only spending 10% of its time doing anything other than actually solving your problem. Now, I could totally believe that np.linalg.solve itself is too slow for you relative to what you could get out of Fortran, and so you want to do something else. That's probably especially true on small problems, come to think of it: IIRC I once found it faster to unroll certain small solutions by hand, although that was a while back.

But by itself, it's not true that using an explicit loop on the first index will make the overall solution quite slow. If np.linalg.solve is fast enough, the loop won't change it much here.

查看更多
登录 后发表回答