Keyboard Interrupts with python's multiprocess

2019-02-19 08:14发布

问题:

I've found this article which explains how to kill running multiprocessing code using ctr+c. Following code is fully working (it can be terminated it using ctrl+c):

#!/usr/bin/env python

# Copyright (c) 2011 John Reese
# Licensed under the MIT License

import multiprocessing
import os
import signal
import time

def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)

def run_worker():
    time.sleep(15)

def main():
    print "Initializng 5 workers"
    pool = multiprocessing.Pool(5, init_worker)

    print "Starting 3 jobs of 15 seconds each"
    for i in range(3):
        pool.apply_async(run_worker)

    try:
        print "Waiting 10 seconds"
        time.sleep(10)

    except KeyboardInterrupt:
        print "Caught KeyboardInterrupt, terminating workers"
        pool.terminate()
        pool.join()

    else:
        print "Quitting normally"
        pool.close()
        pool.join()

if __name__ == "__main__":
    main()

The problem is that I use different functions from multiprocessing module. I do not know how they are different to previous approach, it just works for me (except that this example cannot be terminated it using ctrl+c). Here is the code which I've been trying to modify according above version (previous version without signal handling used to printing tracebacks when ctrl+c was hit):

#!/usr/bin/env python

from time import sleep
import signal
from multiprocessing import Pool
from multiprocessing import cpu_count

def init_worker(n):
  signal.signal(signal.SIGINT, signal.SIG_IGN)
  sleep(.5)
  print "n = %d" % n
  results_sent_back_to_parent = n * n
  return results_sent_back_to_parent

if __name__ == '__main__':
  try:
    p = Pool(processes = cpu_count())
    results = p.map(init_worker, range(50), chunksize = 10)
  except KeyboardInterrupt:
    pool.terminate()
    pool.join()

  print(results)

Questions:

  1. Why is ctrl+c working in first example but not in 2nd
  2. How to modify 2nd code that ctrl+c will work?
  3. How does both codes differ (I mean in context of multiprocessing, one uses e.g. pool.apply_async and another map)?

EDIT

in reply to @user2386841

I've commented signal.signal(signal.SIGINT, signal.SIG_IGN) in init_worker and tried to add right after if __name__ == '__main__': but id did not worked, the same when I added it as last line in try: block

in reply to @ThomasWagenaar

It behaves exactly the same (I've also tried various locations for signal handler as mentioned above); numbers are printing despite hitting ctr+c and the only possible way to kill the script is to send it to background using ctrl+z and then killing with kill %1

回答1:

I solved this problem with this simple function:

import os
import psutil
import signal

parent_id = os.getpid()
def worker_init():
    def sig_int(signal_num, frame):
        print('signal: %s' % signal_num)
        parent = psutil.Process(parent_id)
        for child in parent.children():
            if child.pid != os.getpid():
                print("killing child: %s" % child.pid)
                child.kill()
        print("killing parent: %s" % parent_id)
        parent.kill()
        print("suicide: %s" % os.getpid())
        psutil.Process(os.getpid()).kill()
    signal.signal(signal.SIGINT, sig_int)

I attached it to my Pool:

Pool(3, worker_init)

The result after ctrl^c is:

^Csignal: 2
signal: 2
signal: 2
killing child: 14109
killing child: 14110
killing parent: 14104
suicide: 14108
Killed

And everything get exited



回答2:

Old thread, but the reason these examples behave differently is due to a well known Python bug (http://bugs.python.org/issue8296, also explained in this StackOverflow answer).

You should read that other answer in full to get the whole idea, but in short the problem is that the underlying threading.Condition.wait() call behaves differently depending on whether or not it is passed a timeout. map() does not use a timeout, but apply_async() apparently does utilize this timeout argument, and only when that underlying wait() call has a timeout does it interrupt correctly.

You should be able to refactor your code to use one of the asynchronous methods specified in the Pool docs.