I'm having a problem monitoring a program using monit.
I'm running this on a raspberry pi, having built monit 5.11 from source; I tried using the version from the repositories, but it was 5.4 and didn't support some of syntax below that I want.
I'm trying to follow the "Q: I have a program that does not create its own pid file. Since monit requires all programs to have a pid file, what do I do?" entry in the FAQ.
Here's my start_sensors.sh script (which just runs my python program, instead of the java program in the wiki example):
#!/bin/bash
case $1 in
start)
echo $$ > /var/run/start_sensors.pid;
exec 2>&1 /usr/bin/python /home/pi/temperature/post_temps.py 1>/tmp/post_temps.out
;;
stop)
kill `cat /var/run/start_sensors.pid` ;;
*)
echo "usage: start_sensors {start|stop}" ;;
esac
exit 0
Here's my /etc/monit/monitrc
entry:
# Run temperature sensor monitor
check process start_sensors.sh with pidfile /var/run/start_sensors.pid
start = "/home/pi/temperature/start_sensors.sh start"
stop = "/home/pi/temperature/start_sensors.sh stop"
The output in the monit log looks like:
[EST Jan 24 14:21:16] info : 'raspberrypi' Monit reloaded
[EST Jan 24 14:21:16] error : 'start_sensors.sh' process is not running
[EST Jan 24 14:21:16] info : 'start_sensors.sh' trying to restart
[EST Jan 24 14:21:16] info : 'start_sensors.sh' start: /home/pi/temperature/start_sensors. sh
[EST Jan 24 14:21:46] error : 'start_sensors.sh' failed to start (exit status -1) -- Program /home/pi/temperature/start_sensors.sh timed out
So as you can see, monit starts up the program, it runs fine, and then monit kills it thirty seconds later due to the "timeout".
My program is running fine, and producing the proper output that I'm sending to the /tmp/post_temps.out file.
I don't understand why monit is timing the program out... it's supposed to be a long-running process!
I've tried changing the start_sensors.sh script so that it puts the program in the background (and has it write its own /var/run/start_sensors.pid file), but then monit starts a new instance up every thirty seconds or so, not stopping the old ones, and writing over the pid file. It's like it's not even looking at the pid file.
THANKS!