Optimization: correct way to pass large array to m

2019-08-16 17:38发布

问题:

I need to perform a function on elements of a large array and I want to optimize the code using multiprocessing so that I can utilize all the cores on a supercomputer.

This is a follow up to the question I asked here.

I used the code:

import numpy as np
from scipy import misc, ndimage
import itertools
from pathos.multiprocessing import ProcessPool
import time

start = time.time()

#define the original array a as
a= np.load('100by100by100array.npy')

n= a.ndim #number of dimensions
imx, imy, imz = np.shape(a)
print('imx,imy,imz = '+str(imx)+','+str(imy)+','+str(imz))

#define the distorted array b as
imsd = 1 #how blurry the distorted image is
b = ndimage.filters.gaussian_filter(a, imsd, order=0, output=None, mode='reflect', cval=0.0)#, truncate=4.0)

#define a window/weighting function, i.e. a Gaussian for which the local values are computed

wsd = float(1.5) #sd for the window function

#create a n dimensional distribution
x, y, z = np.ogrid[0:imx, 0:imy, 0:imz]
r = (x**2 + y**2 + z**2)**0.5

G1= (2*np.pi)**(-(np.float(n)/2))*(wsd**(np.float(n)))*np.exp(-0.5*wsd**(-2)*(r)**2)

def SSIMfunc(triple):
    cenx=triple[0]
    ceny=triple[1]
    cenz=triple[2]
    G= np.roll(G1, shift=cenx, axis=0)
    G= np.roll(G, shift=ceny, axis=1)
    if cenz is not None:
        G= np.roll(G, shift=cenz, axis=2)

    #define the mean
    mu_a = np.sum(G*a)
    mu_b = np.sum(G*b)

    #define the sample standard deviation

    sd_a = (np.sum(G*(a - mu_a)**2))**(0.5)
    sd_b = (np.sum(G*(b - mu_b)**2))**(0.5)
    sd_ab= np.sum(G*(a - mu_a)*(b - mu_b))

    C1=0.001
    l= (2*mu_a*mu_b+C1)/(mu_a**2 + mu_b**2 +C1)

    C2=0.001
    c= (2*sd_a*sd_b+C2)/(sd_a**2 + sd_b**2 +C2)

    C3=0.001
    s= (sd_ab +C3)/(sd_a*sd_b +C3)

    return l*c*s

pool = ProcessPool()
SSIMarray = np.zeros((imx,imy,imz))

if __name__ == '__main__':
    results = pool.map(SSIMfunc, list(itertools.product(range(0,imx), range(0,imy), range(0,imz))))
SSIMarray = np.reshape(results, (imx,imy,imz))

M= imx*imy*imz

MSSIM = (M**(-1))*np.sum(SSIMarray)
print('MSSIM ='+str(MSSIM))

end = time.time()
print(end - start)

If I import and use a 40*40*40 array then it takes only 8 seconds of wall time and < 3 mins of cpu time.

However importing and using a 100*100*100 array means the code doesn't finish running in over 5 hours of cpu time or 10 minutes of walltime, and not even over 1 hour of wall time.

I am guessing that there is something wrong with passing such a large array through map but I haven't been able to find a solution yet. Using CProfile the problem appears to be

method 'acquire' of 'thread.lock' objects

What is the correct way to apply a function to a large array of elements?

Note that I can't simply chop up the array into pieces as the function value for each element of the array depends on the values around that element as well.