numpy vectorization of double python for loop

V is (n,p) numpy array typically dimensions are n~10, p~20000

The code I have now looks like

A = np.zeros(p)
for i in xrange(n):
    for j in xrange(i+1):
        A += F[i,j] * V[i,:] * V[j,:]

How would I go about rewriting this to avoid the double python for loop?

标签： python numpy linear-algebra

3条回答

淡お忘

2楼-- · 2019-04-10 08:07

While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M which is n times the size of your original V array. Python for loops are not cheap, but memory access ain't free either:

n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)

def op_code(V, F):
    n, p = V.shape
    A = np.zeros(p)
    for i in xrange(n):
        for j in xrange(i+1):
            A += F[i,j] * V[i,:] * V[j,:]
    return A

def isaac_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
    return M.sum((0, 1))

If you now take both for a test ride:

In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True

In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop

In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop

So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum:

def einsum_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    return np.einsum('ij,ik,jk->k', F, V, V)

And now:

In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True

In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop

So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...

0人赞添加讨论(0) 举报

何必那么认真

3楼-- · 2019-04-10 08:26

The expression can be written as

$A_p = \sum_{ij}F_{ij}V_{ip}V_{jp}$

and thus you can sum it like this using the np.newaxis-construct:

na = np.newaxis
X = (np.tri(n)*F)[:,:,na]*V[:,na,:]*V[na,:,:]
X.sum(axis=1).sum(axis=0)

Here a 3D-array X[i,j,p] is constructed, and then the 2 first axes are summed, which results in a 1D array A[p]. Additionaly F was multiplied with a triangular matrix to restrict the summation according to the problem.

0人赞添加讨论(0) 举报

SAY GOODBYE

4楼-- · 2019-04-10 08:31

The difficult part about this is that you only want to take the sum of the elements with j <= i. If not for that then you could do the following:

M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
A = M.sum(0).sum(0)

If F is symmetric (if F[i,j] == F[j,i]) then you can exploit the symmetry of M above as follows:

D = M[range(n), range(n)].sum(0)
A = (M.sum(0).sum(0) - D) / 2.0 + D

That said, this is really not a great candidate for vectorization, as you have n << p and so your for-loops are not going to have much effect on the speed of this computation.

Edit: As Bill said below, you can just make sure that the elements of F that you don't want to use are set to zero first, and then the M.sum(0).sum(0) result will be what you want.

0人赞添加讨论(0) 举报

numpy vectorization of double python for loop

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间