numpy vectorization of double python for loop

2019-04-10 08:10发布

问题:

V is (n,p) numpy array typically dimensions are n~10, p~20000

The code I have now looks like

A = np.zeros(p)
for i in xrange(n):
    for j in xrange(i+1):
        A += F[i,j] * V[i,:] * V[j,:]

How would I go about rewriting this to avoid the double python for loop?

回答1:

While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M which is n times the size of your original V array. Python for loops are not cheap, but memory access ain't free either:

n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)

def op_code(V, F):
    n, p = V.shape
    A = np.zeros(p)
    for i in xrange(n):
        for j in xrange(i+1):
            A += F[i,j] * V[i,:] * V[j,:]
    return A

def isaac_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
    return M.sum((0, 1))

If you now take both for a test ride:

In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True

In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop

In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop

So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum:

def einsum_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    return np.einsum('ij,ik,jk->k', F, V, V)

And now:

In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True

In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop

So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...



回答2:

The difficult part about this is that you only want to take the sum of the elements with j <= i. If not for that then you could do the following:

M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
A = M.sum(0).sum(0)

If F is symmetric (if F[i,j] == F[j,i]) then you can exploit the symmetry of M above as follows:

D = M[range(n), range(n)].sum(0)
A = (M.sum(0).sum(0) - D) / 2.0 + D

That said, this is really not a great candidate for vectorization, as you have n << p and so your for-loops are not going to have much effect on the speed of this computation.

Edit: As Bill said below, you can just make sure that the elements of F that you don't want to use are set to zero first, and then the M.sum(0).sum(0) result will be what you want.



回答3:

The expression can be written as

and thus you can sum it like this using the np.newaxis-construct:

na = np.newaxis
X = (np.tri(n)*F)[:,:,na]*V[:,na,:]*V[na,:,:]
X.sum(axis=1).sum(axis=0)

Here a 3D-array X[i,j,p] is constructed, and then the 2 first axes are summed, which results in a 1D array A[p]. Additionaly F was multiplied with a triangular matrix to restrict the summation according to the problem.