Right, so I have a python process which is running constantly, maybe even on Supervisor. What is the best way to achieve the following monitoring?
- Send an alert and restart if the process has crashed. I'd like to automatically receive a signal every time the process crashes and auto restart it.
- Send an alert and restart if the process has gone stale, i.e. hasn't crunched anything for say 1 minute.
- Restart on demand
I'd like the achieve all of the above through Python. I know Supervisord will do most of it, but I want to see if it can be done through Python itself.
I think what you are looking for is, Supervisor Events. http://supervisord.org/events.html
Also look at Superlance, its a package of plugin utilities for monitoring and controlling processes that run under supervisor.
You can configure stuff like Crash emails, Crash SMS, Memory consumption alerts, HTTP hooks etc.
Well, if you want a homegrown solution, this is what I could come up with.
Maintain the process state both actual and expected in redis. You can monitor it the way you want by making a web interface to check the actual state and change the expected state.
Run the python script in crontab to check for state and take appropriate action when required. Here I have checked for every 3 seconds and used SES to alert admins via email.
DISCLAIMER: The code has not been run or tested. I just wrote it now, so prone to errors.
open crontab file:
$crontab -e
Add this line at the end of it, to make the run_process.sh run every minute.
#Runs this process every 1 minute.
*/1 * * * * bash ~/path/to/run_monitor.sh
run_moniter.sh runs the python script. It runs in a for loop every 3 second.
This is done because crontab gives minimum time interval of 1 minute. We want to check for the process every 3 second, 20 times (3sec * 20 = 1 minute). So it will run for a minute before crontab runs it again.
for count in {0..20}
cd '/path/to/check_status'
/usr/local/bin/python check_status.py "myprocessname" "python startcommand.py"
sleep 3 #check every 3 seconds.
Here I have assumed:
*state 0 = stop or stopped (expected vs. actual)
*state -1 = restart
*state 1 = run or running
You can add more states as per your convinience, stale process can also be a state.
I have used processname to kill or start or check processes, you can easily modify it to read specific PID files.
import sys
import redis
import subprocess
import sys
import boto.ses
def send_mail(recipients, message_subject, message_body):
uses AWS SES to send mail.
SENDER_MAIL = 'xxx@yyy.com'
AWS_KEY = 'xxxxxxxxxxxxxxxxxxx'
AWS_SECRET = 'xxxxxxxxxxxxxxxxxxx'
AWS_REGION = 'xx-xxxx-x'
mail_conn = boto.ses.connect_to_region(AWS_REGION,
mail_conn.send_email(SENDER_MAIL, message_subject, message_body, recipient, format='html')
return True
class Shell(object):
Convinient Wrapper over Subprocess.
def __init__(self, command, raise_on_error=True):
self.command = command
self.output = None
self.error = None
def run(self):
process = subprocess.Popen(self.command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
self.return_code = process.wait()
self.output, self.error = process.communicate()
if self.return_code and self.raise_on_error:
print self.error
raise Exception("Error while executing %s::%s"%(self.command, self.error))
except subprocess.CalledProcessError:
print self.error
raise Exception("Error while executing %s::%s"%(self.command, self.error))
redis_client = redis.Redis('xxxredis_hostxxx')
def get_state(process_name, state_type): #state_type will be expected or actual.
state = redis.get('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type)) #value could be 0 or 1
return state
def set_state(process_name, state_type, state): #state_type will be expected or actual.
state = redis.set('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type), state)
return state
def get_stale_state(process_name):
state = redis.get('{process_name}_stale_state'.format(process_name=process_name)) #value could be 0 or 1
return state
def check_running_status(process_name):
command = "ps -ef|grep {process_name}|wc -l".format(process_name=process_name)
shell = Shell(command = command)
if shell.output=='0':
return False
return True
def start_process(start_command): #pass start_command with a '&' so the process starts in the background.
shell = Shell(command = command)
def stop_process(process_name):
command = "ps -ef| grep {process_name}| awk '{print $2}'".format(process_name=process_name)
shell = Shell(command = command, raise_on_error=False)
if not shell.output:
process_ids = shell.output.strip().split()
for process_id in process_ids:
command = 'kill {process_id}'.format(process_id=process_id)
shell = Shell(command=command, raise_on_error=False)
def check_process(process_name, start_command):
expected_state = get_state(process_name, 'expected')
if expected_state == 0: #stop
set_state(process_name, 'actual', 0)
else if expected_state == -1: #restart
set_state(process_name, 'actual', 0)
set_state(process_name, 'actual', 1)
set_state(process_name, 'expected', 1) #set expected back to 1 so we dont keep on restarting.
elif expected_state == 1:
running = check_running_status(process_name)
if not running:
set_state(process_name, 'actual', 0)
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is Down. Trying to restart")
running = check_running_status(process_name)
if running:
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is was restarted.")
set_state(process_name, 'actual', 1)
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is could not be restarted.")
if __name__ == '__main__':
args = sys.argv[1:]
process_name = args[0]
start_command = args[1]
check_process(process_name, start_command)