What happens when a python script opens subprocesses and one process crashes?
https://stackoverflow.com/a/18216437/311901
Will the main process crash?
Will the other subprocesses crash?
Is there a signal or other event that's propagated?
What happens when a python script opens subprocesses and one process crashes?
https://stackoverflow.com/a/18216437/311901
Will the main process crash?
Will the other subprocesses crash?
Is there a signal or other event that's propagated?
When using multiprocessing.Pool
, if one of the subprocesses in the pool crashes, you will not be notified at all, and a new process will immediately be started to take its place:
>>> import multiprocessing
>>> p = multiprocessing.Pool()
>>> p._processes
4
>>> p._pool
[<Process(PoolWorker-1, started daemon)>, <Process(PoolWorker-2, started daemon)>, <Process(PoolWorker-3, started daemon)>, <Process(PoolWorker-4, started daemon)>]
>>> [proc.pid for proc in p._pool]
[30760, 30761, 30762, 30763]
Then in another window:
dan@dantop:~$ kill 30763
Back to the pool:
>>> [proc.pid for proc in p._pool]
[30760, 30761, 30762, 30767] # New pid for the last process
You can continue using the pool as if nothing happened. However, any work item that the killed child process was running at the time it died will not be completed or restarted. If you were running a blocking map
or apply
call that was relying on that work item to complete, it will likely hang indefinitely. There is a bug filed for this, but the issue was only fixed in concurrent.futures.ProcessPoolExecutor
, rather than in multiprocessing.Pool
. Starting with Python 3.3, ProcessPoolExecutor
will raise a BrokenProcessPool
exception if a child process is killed, and disallow any further use of the pool. Sadly, multiprocessing
didn't get this enhancement. For now, if you want to guard against a pool call blocking forever due to a sub-process crashing, you have to use ugly workarounds.
Note: The above only applies to a process in a pool actually crashing, meaning the process completely dies. If a sub-process raises an exception, that will be propagated up the parent process when you try to retrieve the result of the work item:
>>> def f(): raise Exception("Oh no")
...
>>> pool = multiprocessing.Pool()
>>> result = pool.apply_async(f)
>>> result.get()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
Exception: Oh no
When using a multiprocessing.Process
directly, the process object will show that the process has exited with a non-zero exit code if it crashes:
>>> def f(): time.sleep(30)
...
>>> p = multiprocessing.Process(target=f)
>>> p.start()
>>> p.join() # Kill the process while this is blocking, and join immediately ends
>>> p.exitcode
-15
The behavior is similar if an exception is raised:
from multiprocessing import Process
def f(x):
raise Exception("Oh no")
if __name__ == '__main__':
p = Process(target=f)
p.start()
p.join()
print(p.exitcode)
print("done")
Output:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.2/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/python3.2/multiprocessing/process.py", line 116, in run
self._target(*self._args, **self._kwargs)
TypeError: f() takes exactly 1 argument (0 given)
1
done
As you can see, the traceback from the child is printed, but it doesn't affect exceution of the main process, which is able to show the exitcode
of the child was 1
.