Importing Modules that use MultiProcessing Python

2019-06-17 03:55发布

问题:

I am looking to use the multiprocessing module to speed up the run time of some Transport Planning models. I've optimized as much as I can via 'normal' methods but at the heart of it is an absurdly parallel problem. Eg Perform the same set of matrix operations four 4 different sets of inputs, all independent information.

Pseudo Code:

    for mat1,mat2,mat3,mat4 in zip([a1,a2,a3,a4],[b1,b2,b3,b4],[c1,c2,c3,c4],[d1,d2,d3,d4]):
        result1 = mat1*mat2^mat3
        result2 = mat1/mat4
        result3 = mat3.T*mat2.T+mat4

So all I really want to do is process the iterations of this loop in parallel on a quad core computer. I've read up here and other places on the multiprocessing module and it seems to fit the bill perfectly except for the required:

   if __name__ == '__main__'

From what I understand this means that you can only multiprocess code run from a script? ie if I do something like:

    import multiprocessing
    from numpy.random import randn

    a = randn(100,100)
    b = randn(100,100)
    c = randn(100,100)
    d = randn(100,100)

    def process_matrix(mat):
        return mat^2

    if __name__=='__main__':
        print "Multiprocessing"
        jobs=[]

        for input_matrix in [a,b,c,d]:
            p = multiprocessing.Process(target=process_matrix,args=(input_matrix,))
            jobs.append(p)
            p.start()

It runs fine, however assuming I saved the above as 'matrix_multiproc.py', and defined a new file 'importing_test.py' which just states:

    import matrix_multiproc

The multiprocessing does not happen because the name is now 'matrix_multiproc' and not 'main'

Does this mean I can never use parallel processing on an imported module? All I am trying to do is have my model run as:

    def Model_Run():
        import Part1, Part2, Part3, matrix_multiproc, Part4

        Part1.Run()
        Part2.Run()
        Part3.Run()
        matrix_multiproc.Run()
        Part4.Run()

Sorry for a really long question to what is probably a simple answer, thanks!

回答1:

Does this mean I can never use parallel processing on an imported module?

No, it doesn't. You can use multiprocessing anywhere in your code, provided that the program's main module uses the if __name__ == '__main__' guard.

On Unix systems, you won't even need that guard, since it features the fork() system call to create child processes from the main python process.

On Windows, on the other hand, fork() is emulated by multiprocessing by spawning a new process that runs the main module again, using a different __name__. Without the guard here, your main application will try to spawn new processes again, resulting in an endless loop, and eating up all your computer's memory pretty fast.