Vectorizing triple for loop in Python/Numpy with d

2019-05-26 02:32发布

问题:

I am new in Python/Numpy and is trying to improve my triple for loop into a more efficient calculation, but can't quiet figure out how to do it.

The calculations is carried out on a grid of the size (25,35) and the shapes of arrays is:

 A = (8760,25,35)
 B = (12,25,35)

The first dimensions in A corresponds to the number hours in a year (~8760), and the first dimension in B is the number of months(12). I want to use the values in B[0,:,:] for the first month, and B[1,:,:] for the second etc.

So far I created, in a very unrefined way, a index array filled with 1,1,1...,2,2,2...,12 to extract the values from B. My code with some random numbers

    N,M = (25, 35)
    A = np.random.rand(8760,N,M)
    B = np.random.rand(12,N,M)
    q = len(A)/12
    index = np.hstack((np.full((1,q),1),np.full((1,q),2),np.full((1,q),3),np.full((1,q),4),np.full((1,q),5),np.full((1,q),6),np.full((1,q),7),np.full((1,q),8),np.full((1,q),9),np.full((1,q),10),np.full((1,q),11),np.full((1,q),12)))-1
    results = np.zeros((len(A),N,M))

    for t in xrange(len(A)):
        for i in xrange(N):
            for j in xrange(M):
                results[t][i][j] = some_function(A[t][i][j], B[index[(0,t)]][i][j],H = 80.)


    def some_function(A,B,H = 80.0):
        results = A*np.log(H/B)/np.log(10./B)
        return results

How can increase the speed of this calculation?

回答1:

NumPy suports broadcasting that allows elementwise operations to be performed across different shaped arrays in a highly optimized manner. In your case, you have the number of rows and columns in A and B the same. But, at the first dimension, the number of elements are different across these two arrays. Looking at the implementation, it seems B 's first dimension elements are repeated per q number until we go over to the next element in it's first dimension. This coincides with the fact that the number of elements in first dimension of B is q times the number of elements in first dimension of A.

Now, going back to broadcasting, the solution would be to split the first dimension of A to have a 4D array, such that we have the number of elements in the first dimension of this reshaped 4D array matching up with the number of elements in B's first dimension. Next up, reshape B to a 4D array as well by creating singleton dimension (dimension with no elements) at the second dimension with B[:,None,:,:]. Then, NumPy would use broadcasting magic and perform broadcasted elementwise multiplications, as that's what we are doing in our some_function.

Here's the vectorized implementation using NumPy's broadcasting capability -

H = 80.0
M,N,R = B.shape
B4D = B[:,None,:,:]
out = ((A.reshape(M,-1,N,R)*np.log(H/B4D))/np.log(10./B4D)).reshape(-1,N,R)

Runtime tests and output verification -

In [50]: N,M = (25, 35)
    ...: A = np.random.rand(8760,N,M)
    ...: B = np.random.rand(12,N,M)
    ...: H = 80.0
    ...: 

In [51]: def some_function(A,B,H = 80.0):
    ...:     return A*np.log(H/B)/np.log(10./B)
    ...: 

In [52]: def org_app(A,B,H):
    ...:    q = len(A)/len(B)
    ...:    index = np.repeat(np.arange(len(B))[:,None],q,axis=1).ravel()[None,:] # Simpler
    ...:    results = np.zeros((len(A),N,M))
    ...:    for t in xrange(len(A)):
    ...:        for i in xrange(N):
    ...:            for j in xrange(M):
    ...:                results[t][i][j] = some_function(A[t][i][j], B[index[(0,t)]][i][j])
    ...:    return results
    ...: 

In [53]: def vectorized_app(A,B,H):
    ...:    M,N,R = B.shape
    ...:    B4D = B[:,None,:,:]
    ...:    return ((A.reshape(M,-1,N,R)*np.log(H/B4D))/np.log(10./B4D)).reshape(-1,N,R)
    ...: 

In [54]: np.allclose(org_app(A,B,H), vectorized_app(A,B,H))
Out[54]: True

In [55]: %timeit org_app(A,B,H)
1 loops, best of 3: 1min 32s per loop

In [56]: %timeit vectorized_app(A,B,H)
10 loops, best of 3: 217 ms per loop