Where to call join() when multiprocessing

2019-02-28 13:41发布

问题:

When using multiprocessing in Python, I usually see examples where the join() function is called in a separate loop to where each process was actually created.

For example, this:

processes = []

for i in range(10):
    p = Process(target=my_func)
    processes.append(p)
    p.start()

for p in processes:
    p.join()

is more common than this:

processes = []

for i in range(10):
    p = Process(target=my_func)
    processes.append(p)
    p.start()
    p.join()

But from my understanding of join(), it just tells the script not to exit until that process has finished. Therefore, it shouldn't matter when join() is called. So why is it usually called in a separate loop?

回答1:

join() is blocking operation.

In first example you start 10 processes and then you are waiting for all procces to finish. All processes are running at same time.

In second example you start one process at time and you are waiting for finish before you start another process. There is only one running process at same time

First example:

def wait()
    time.sleep(1)

# You start 10 processes
for i in range(10):
    p = Process(target=wait)
    processes.append(p)
    p.start()

# One second after all processes can be finished you check them all and finish
for p in processes:
    p.join()

Execution time of whole script can be near one second.

Second example:

for i in range(10):
    p = Process(target=wait) # Here you start one process 
    processes.append(p)
    p.start()
    p.join() # Here you will have to wait one second before process finished.

Execution time of whole script can be near 10 seconds!.