Problem
From the multiprocessing.Pool
docs:
apply_async(func ...)
: A variant of theapply()
method which returns a result object. ...
Reading further ...
apply(func[, args[, kwds]])
: Call func with arguments args and keyword arguments kwds. It blocks until the result is ready. Given this blocks, apply_async() is better suited for performing work in parallel. Additionally, func is only executed in one of the workers of the pool.
The last bold line suggests only one worker from a pool is used. I find this is only true under certain conditions.
Given
Here is code that executes Pool.apply_async()
in three similar cases. In all cases, the process id is printed.
import os
import time
import multiprocessing as mp
def blocking_func(x, delay=0.1):
"""Return a squared argument after some delay."""
time.sleep(delay) # toggle comment here
return x*x, os.getpid()
def apply_async():
"""Return a list applying func to one process with a callback."""
pool = mp.Pool()
# Case 1: From the docs
results = [pool.apply_async(os.getpid, ()) for _ in range(10)]
results = [res.get(timeout=1) for res in results]
print("Docs :", results)
# Case 2: With delay
results = [pool.apply_async(blocking_func, args=(i,)) for i in range(10)]
results = [res.get(timeout=1)[1] for res in results]
print("Delay :", results)
# Case 3: Without delay
results = [pool.apply_async(blocking_func, args=(i, 0)) for i in range(10)]
results = [res.get(timeout=1)[1] for res in results]
print("No delay:", results)
pool.close()
pool.join()
if __name__ == '__main__':
apply_async()
Results
The example from the docs (Case 1) confirms only one worker is run. We extend this example in the next cases by applying blocking_func
, which blocks with some delay.
Commenting the time.sleep()
line in blocking_func()
brings all cases in agreement.
# Time commented
# 1. Docs : [8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208]
# 2. Delay : [8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208]
# 3. No delay: [8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208, 8208]
Each call to apply_async()
creates a new process pool, which is why new process ids differ from the latter.
# Time uncommented
# 1. Docs : [6780, 6780, 6780, 6780, 6780, 6780, 6780, 6780, 6780, 6780]
# 2. Delay : [6780, 2112, 6780, 2112, 6780, 2112, 6780, 2112, 6780, 2112]
# 3. No delay: [6780, 2112, 6780, 2112, 6780, 2112, 6780, 2112, 6780, 2112]
However when time.sleep()
is uncommented, even with zero delay, more than one worker is used.
In short, uncommented we expect one worker as in Case 1, but we get multiple workers as in Cases 2 and 3.
Question
Although I expect only one worker to be used by Pool().apply_async()
, why are more than one used when time.sleep()
is uncommented? Should blocking even effect the number of workers used by apply
or apply_async
?
Note: previous, related questions ask "why is only one worker used?" This question asks the opposite - "why isn't only one worker used?" I'm using 2 cores on a Windows machine.