I am looking to use the multiprocessing module to speed up the run time of some Transport Planning models. I've optimized as much as I can via 'normal' methods but at the heart of it is an absurdly parallel problem. Eg Perform the same set of matrix operations four 4 different sets of inputs, all independent information.
Pseudo Code:
for mat1,mat2,mat3,mat4 in zip([a1,a2,a3,a4],[b1,b2,b3,b4],[c1,c2,c3,c4],[d1,d2,d3,d4]):
result1 = mat1*mat2^mat3
result2 = mat1/mat4
result3 = mat3.T*mat2.T+mat4
So all I really want to do is process the iterations of this loop in parallel on a quad core computer. I've read up here and other places on the multiprocessing module and it seems to fit the bill perfectly except for the required:
if __name__ == '__main__'
From what I understand this means that you can only multiprocess code run from a script? ie if I do something like:
import multiprocessing
from numpy.random import randn
a = randn(100,100)
b = randn(100,100)
c = randn(100,100)
d = randn(100,100)
def process_matrix(mat):
return mat^2
if __name__=='__main__':
print "Multiprocessing"
jobs=[]
for input_matrix in [a,b,c,d]:
p = multiprocessing.Process(target=process_matrix,args=(input_matrix,))
jobs.append(p)
p.start()
It runs fine, however assuming I saved the above as 'matrix_multiproc.py', and defined a new file 'importing_test.py' which just states:
import matrix_multiproc
The multiprocessing does not happen because the name is now 'matrix_multiproc' and not 'main'
Does this mean I can never use parallel processing on an imported module? All I am trying to do is have my model run as:
def Model_Run():
import Part1, Part2, Part3, matrix_multiproc, Part4
Part1.Run()
Part2.Run()
Part3.Run()
matrix_multiproc.Run()
Part4.Run()
Sorry for a really long question to what is probably a simple answer, thanks!
No, it doesn't. You can use
multiprocessing
anywhere in your code, provided that the program's main module uses theif __name__ == '__main__'
guard.On Unix systems, you won't even need that guard, since it features the
fork()
system call to create child processes from the mainpython
process.On Windows, on the other hand,
fork()
is emulated bymultiprocessing
by spawning a new process that runs the main module again, using a different__name__
. Without the guard here, your main application will try to spawn new processes again, resulting in an endless loop, and eating up all your computer's memory pretty fast.