multiprocessing.Pool with maxtasksperchild produce

2020-02-28 05:43发布

问题:

I need to run a function in a process, which is completely isolated from all other memory, several times. I would like to use multiprocessing for that (since I need to serialize a complex output coming from the functions). I set the start_method to 'spawn' and use a pool with maxtasksperchild=1. I would expect to get a different process for each task, and therefore see a different PID:

import multiprocessing
import time
import os

def f(x):
    print("PID: %d" % os.getpid())
    time.sleep(x)
    complex_obj = 5 #more complex axtually
    return complex_obj

if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')
    pool = multiprocessing.Pool(4, maxtasksperchild=1)
    pool.map(f, [5]*30)
    pool.close()

However the output I get is:

$ python untitled1.py 
PID: 30010
PID: 30009
PID: 30012
PID: 30011
PID: 30010
PID: 30009
PID: 30012
PID: 30011
PID: 30018
PID: 30017
PID: 30019
PID: 30020
PID: 30018
PID: 30019
PID: 30017
PID: 30020
...

So the processes are not being respawned after every task. Is there an automatic way of getting a new PID each time (ie without starting a new pool for each set of processes)?

回答1:

You need to also specify chunksize=1 in the call to pool.map. Otherwise, multiple items in your iterable get bundled together into one "task" from the perception of the worker processes:

import multiprocessing
import time
import os

def f(x):
    print("PID: %d" % os.getpid())
    time.sleep(x)
    complex_obj = 5 #more complex axtually
    return complex_obj

if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')
    pool = multiprocessing.Pool(4, maxtasksperchild=1)
    pool.map(f, [5]*30, chunksize=1)
    pool.close()

Output doesn't have repeated PIDs now:

PID: 4912
PID: 4913
PID: 4914
PID: 4915
PID: 4938
PID: 4937
PID: 4940
PID: 4939
PID: 4966
PID: 4965
PID: 4970
PID: 4971
PID: 4991
PID: 4990
PID: 4992
PID: 4993
PID: 5013
PID: 5014
PID: 5012


回答2:

observe that using chunksize=1 in a Pool map will do the pool wait for a complete round of process to finish to start a new one.

with Pool(3, maxtasksperchild=1) as p:
    p.map(do_job, args_list, chunksize=1)

For example, above the pool will wait until all the first 3 process (eg 1000,1001,1002) finish to then start the new round(1003,1004,1005)