Python Multiprocessing: Crash in subprocess?

2020-02-01 06:12发布

问题:

What happens when a python script opens subprocesses and one process crashes?

https://stackoverflow.com/a/18216437/311901

Will the main process crash?

Will the other subprocesses crash?

Is there a signal or other event that's propagated?

回答1:

When using multiprocessing.Pool, if one of the subprocesses in the pool crashes, you will not be notified at all, and a new process will immediately be started to take its place:

>>> import multiprocessing
>>> p = multiprocessing.Pool()
>>> p._processes
4
>>> p._pool
[<Process(PoolWorker-1, started daemon)>, <Process(PoolWorker-2, started daemon)>, <Process(PoolWorker-3, started daemon)>, <Process(PoolWorker-4, started daemon)>]
>>> [proc.pid for proc in p._pool]
[30760, 30761, 30762, 30763]

Then in another window:

dan@dantop:~$ kill 30763

Back to the pool:

>>> [proc.pid for proc in p._pool]
[30760, 30761, 30762, 30767]  # New pid for the last process

You can continue using the pool as if nothing happened. However, any work item that the killed child process was running at the time it died will not be completed or restarted. If you were running a blocking map or apply call that was relying on that work item to complete, it will likely hang indefinitely. There is a bug filed for this, but the issue was only fixed in concurrent.futures.ProcessPoolExecutor, rather than in multiprocessing.Pool. Starting with Python 3.3, ProcessPoolExecutor will raise a BrokenProcessPool exception if a child process is killed, and disallow any further use of the pool. Sadly, multiprocessing didn't get this enhancement. For now, if you want to guard against a pool call blocking forever due to a sub-process crashing, you have to use ugly workarounds.

Note: The above only applies to a process in a pool actually crashing, meaning the process completely dies. If a sub-process raises an exception, that will be propagated up the parent process when you try to retrieve the result of the work item:

>>> def f(): raise Exception("Oh no")
... 
>>> pool = multiprocessing.Pool()
>>> result = pool.apply_async(f)
>>> result.get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
Exception: Oh no

When using a multiprocessing.Process directly, the process object will show that the process has exited with a non-zero exit code if it crashes:

>>> def f(): time.sleep(30)
... 
>>> p = multiprocessing.Process(target=f)
>>> p.start()
>>> p.join()  # Kill the process while this is blocking, and join immediately ends
>>> p.exitcode
-15

The behavior is similar if an exception is raised:

from multiprocessing import Process

def f(x):
    raise Exception("Oh no")

if __name__ == '__main__':
    p = Process(target=f)
    p.start()
    p.join()
    print(p.exitcode)
    print("done")

Output:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.2/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/usr/lib/python3.2/multiprocessing/process.py", line 116, in run
    self._target(*self._args, **self._kwargs)
TypeError: f() takes exactly 1 argument (0 given)
1
done

As you can see, the traceback from the child is printed, but it doesn't affect exceution of the main process, which is able to show the exitcode of the child was 1.