Kill or terminate subprocess when timeout?

2019-01-08 12:20发布

问题:

I would like to repeatedly execute a subprocess as fast as possible. However, sometimes the process will take too long, so I want to kill it. I use signal.signal(...) like below:

ppid=pipeexe.pid
signal.signal(signal.SIGALRM, stop_handler)

signal.alarm(1)
.....
def stop_handler(signal, frame):
    print 'Stop test'+testdir+'for time out'
    if(pipeexe.poll()==None and hasattr(signal, "SIGKILL")):
         os.kill(ppid, signal.SIGKILL)
         return False

but sometime this code will try to stop the next round from executing. Stop test/home/lu/workspace/152/treefit/test2for time out /bin/sh: /home/lu/workspace/153/squib_driver: not found ---this is the next execution; the program wrongly stops it.

Does anyone know how to solve this? I want to stop in time not execute 1 second the time.sleep(n) often wait n seconds. I do not want that I want it can execute less than 1 second

回答1:

You could do something like this:

import subprocess as sub
import threading

class RunCmd(threading.Thread):
    def __init__(self, cmd, timeout):
        threading.Thread.__init__(self)
        self.cmd = cmd
        self.timeout = timeout

    def run(self):
        self.p = sub.Popen(self.cmd)
        self.p.wait()

    def Run(self):
        self.start()
        self.join(self.timeout)

        if self.is_alive():
            self.p.terminate()      #use self.p.kill() if process needs a kill -9
            self.join()

RunCmd(["./someProg", "arg1"], 60).Run()

The idea is that you create a thread that runs the command and to kill it if the timeout exceeds some suitable value, in this case 60 seconds.



回答2:

Here is something I wrote as a watchdog for subprocess execution. I use it now a lot, but I'm not so experienced so maybe there are some flaws in it:

import subprocess
import time

def subprocess_execute(command, time_out=60):
    """executing the command with a watchdog"""

    # launching the command
    c = subprocess.Popen(command)

    # now waiting for the command to complete
    t = 0
    while t < time_out and c.poll() is None:
        time.sleep(1)  # (comment 1)
        t += 1

    # there are two possibilities for the while to have stopped:
    if c.poll() is None:
        # in the case the process did not complete, we kill it
        c.terminate()
        # and fill the return code with some error value
        returncode = -1  # (comment 2)

    else:                 
        # in the case the process completed normally
        returncode = c.poll()

    return returncode   

Usage:

 return = subprocess_execute(['java', '-jar', 'some.jar'])

Comments:

  1. here, the watchdog time out is in seconds; but it's easy to change to whatever needed by changing the time.sleep() value. The time_out will have to be documented accordingly;
  2. according to what is needed, here it maybe more suitable to raise some exception.

Documentation: I struggled a bit with the documentation of subprocess module to understand that subprocess.Popen is not blocking; the process is executed in parallel (maybe I do not use the correct word here, but I think it's understandable).

But as what I wrote is linear in its execution, I really have to wait for the command to complete, with a time out to avoid bugs in the command to pause the nightly execution of the script.



回答3:

I guess this is a common synchronization problem in event-oriented programming with threads and processes.

If you should always have only one subprocess running, make sure the current subprocess is killed before running the next one. Otherwise the signal handler may get a reference to the last subprocess run and ignore the older.

Suppose subprocess A is running. Before the alarm signal is handled, subprocess B is launched. Just after that, your alarm signal handler attempts to kill a subprocess. As the current PID (or the current subprocess pipe object) was set to B's when launching the subprocess, B gets killed and A keeps running.

Is my guess correct?

To make your code easier to understand, I would include the part that creates a new subprocess just after the part that kills the current subprocess. That would make clear there is only one subprocess running at any time. The signal handler could do both the subprocess killing and launching, as if it was the iteration block that runs in a loop, in this case event-driven with the alarm signal every 1 second.



回答4:

Here's what I use:

class KillerThread(threading.Thread):
  def __init__(self, pid, timeout, event ):
    threading.Thread.__init__(self)
    self.pid = pid
    self.timeout = timeout
    self.event = event
    self.setDaemon(True)
  def run(self):
    self.event.wait(self.timeout)
    if not self.event.isSet() :
      try:
        os.kill( self.pid, signal.SIGKILL )
      except OSError, e:
        #This is raised if the process has already completed
        pass    

def runTimed(dt, dir, args, kwargs ):
  event = threading.Event()
  cwd = os.getcwd()
  os.chdir(dir)
  proc = subprocess.Popen(args, **kwargs )
  os.chdir(cwd)
  killer = KillerThread(proc.pid, dt, event)
  killer.start()

  (stdout, stderr) = proc.communicate()
  event.set()      

  return (stdout,stderr, proc.returncode)


回答5:

A bit more complex, I added an answer to solve a similar problem: Capturing stdout, feeding stdin, and being able to terminate after some time of inactivity and/or after some overall runtime.