I have an executable file which I need to run very often, with different parameters. For this I wrote a small Python (2.7) wrapper, using the multiprocessing module, following the pattern given here.
My code looks like this:
try:
logging.info("starting pool runs")
pool.map(run_nlin, params)
pool.close()
except KeyboardInterrupt:
logging.info("^C pressed")
pool.terminate()
except Exception, e:
logging.info("exception caught: ", e)
pool.terminate()
finally:
time.sleep(5)
pool.join()
logging.info("done")
My worker function is here:
class KeyboardInterruptError(Exception): pass
def run_nlin((path_config, path_log, path_nlin, update_method)):
try:
with open(path_log, "w") as log_:
cmdline = [path_nlin, path_config]
if update_method:
cmdline += [update_method, ]
sp.call(cmdline, stdout=log_, stderr=log_)
except KeyboardInterrupt:
time.sleep(5)
raise KeyboardInterruptError()
except:
raise
path_config
is the path to a configuration file for the binary program; in there is e.g. the date to run the program for.
When I start the wrapper, everything looks fine. However, when I press ^C
, the wrapper script seems to launch an additional numproc
processes from the pool before terminating. As an example, when I start the script for days 1-10, I can see in the ps aux
output that two instances of the binary program are running (usually for days 1 and 3). Now, when I press ^C
, the wrapper script exits, the binary programs for days 1 and 3 are gone, but there are new binary programs running for days 5 and 7.
So to me it seems as if the Pool
launches another numproc
processes before finally dying.
Any ideas what's happening here, and what I can do about it?
On this page, Jesse Noller, author of the multiprocessing module, shows that the correct way to handle
KeyboardInterrupt
is to have the subprocesses return -- not reraise the exception. This allows the main process to terminate the pool.However, as the code below shows, the main process does not reach the
except KeyboardInterrupt
block until after all the tasks generated bypool.map
have been run. This is why (I believe) you are seeing extra calls to your worker function,run_nlin
, afterCtrl-C
has been pressed.One possible workaround is to have all the worker functions test if a
multiprocessing.Event
has been set. If the event has been set, then have the worker bail out early, otherwise, go ahead with the long calculation.Running the script yields:
Here Ctrl-C is pressed; each of the workers sets the
terminating
event. We really only need one to set it, but this works despite the small inefficiency.Now all the other tasks queued by
pool.map
are run:Finally the main process reaches the
except KeyboardInterrupt
block.