V is (n,p) numpy array typically dimensions are n~10, p~20000
The code I have now looks like
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
How would I go about rewriting this to avoid the double python for loop?
While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M
which is n
times the size of your original V
array. Python for loops are not cheap, but memory access ain't free either:
n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)
def op_code(V, F):
n, p = V.shape
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
return A
def isaac_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
return M.sum((0, 1))
If you now take both for a test ride:
In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True
In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop
In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop
So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum
:
def einsum_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
return np.einsum('ij,ik,jk->k', F, V, V)
And now:
In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True
In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop
So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...
The difficult part about this is that you only want to take the sum of the elements with j <= i
. If not for that then you could do the following:
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
A = M.sum(0).sum(0)
If F
is symmetric (if F[i,j] == F[j,i]
) then you can exploit the symmetry of M
above as follows:
D = M[range(n), range(n)].sum(0)
A = (M.sum(0).sum(0) - D) / 2.0 + D
That said, this is really not a great candidate for vectorization, as you have n << p
and so your for
-loops are not going to have much effect on the speed of this computation.
Edit: As Bill said below, you can just make sure that the elements of F
that you don't want to use are set to zero first, and then the M.sum(0).sum(0)
result will be what you want.
The expression can be written as
and thus you can sum it like this using the np.newaxis
-construct:
na = np.newaxis
X = (np.tri(n)*F)[:,:,na]*V[:,na,:]*V[na,:,:]
X.sum(axis=1).sum(axis=0)
Here a 3D-array X[i,j,p]
is constructed, and then the 2 first axes are summed, which results in a 1D array A[p]
. Additionaly F
was multiplied with a triangular matrix to restrict the summation according to the problem.