I'm trying to understand how to monitor the resque worker for travis-ci with god in such a way that stopping the resque watch via god won't leave a stale worker process.
In the following I'm talking about the worker process, not forked job child processes (i.e. the queue is empty all the time).
When I manually start the resque worker like this:
$ QUEUE=builds rake resque:work
I'll get a single process:
$ ps x | grep resque
7041 s001 S+ 0:05.04 resque-1.13.0: Waiting for builds
And this process will go away as soon as I stop the worker task.
But when I start the same thing with god (exact configuration is here, basically the same thing as the resque/god example) like this ...
$ RAILS_ENV=development god -c config/resque.god -D
I [2011-03-27 22:49:15] INFO: Loading config/resque.god
I [2011-03-27 22:49:15] INFO: Syslog enabled.
I [2011-03-27 22:49:15] INFO: Using pid file directory: /Volumes/Users/sven/.god/pids
I [2011-03-27 22:49:15] INFO: Started on drbunix:///tmp/god.17165.sock
I [2011-03-27 22:49:15] INFO: resque-0 move 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'unmonitored' to 'init'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is not running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 start: cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
I [2011-03-27 22:49:15] INFO: resque-0 moved 'init' to 'start'
I [2011-03-27 22:49:15] INFO: resque-0 [trigger] process is running (ProcessRunning)
I [2011-03-27 22:49:15] INFO: resque-0 move 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 moved 'start' to 'up'
I [2011-03-27 22:49:15] INFO: resque-0 [ok] memory within bounds [784kb] (MemoryUsage)
I [2011-03-27 22:49:15] INFO: resque-0 [ok] process is running (ProcessRunning)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] memory within bounds [784kb, 784kb] (MemoryUsage)
I [2011-03-27 22:49:45] INFO: resque-0 [ok] process is running (ProcessRunning)
Then I'll get an extra process:
$ ps x | grep resque
7187 ?? Ss 0:00.02 sh -c cd /Volumes/Users/sven/Development/projects/travis && rake resque:work
7188 ?? S 0:05.11 resque-1.13.0: Waiting for builds
7183 s001 S+ 0:01.18 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
God only seems to log the pid of the first one:
$ cat ~/.god/pids/resque-0.pid
7187
When I then stop the resque watch via god:
$ god stop resque
Sending 'stop' command
The following watches were affected:
resque-0
God gives this log output:
I [2011-03-27 22:51:22] INFO: resque-0 stop: default lambda killer
I [2011-03-27 22:51:22] INFO: resque-0 sent SIGTERM
I [2011-03-27 22:51:23] INFO: resque-0 process stopped
I [2011-03-27 22:51:23] INFO: resque-0 move 'up' to 'unmonitored'
I [2011-03-27 22:51:23] INFO: resque-0 moved 'up' to 'unmonitored'
But it does not actually terminate both of the processes, leaving the actual worker process alive:
$ ps x | grep resque
6864 ?? S 0:05.15 resque-1.13.0: Waiting for builds
6858 s001 S+ 0:01.36 /Volumes/Users/sven/.rvm/rubies/ruby-1.8.7-p302/bin/ruby /Volumes/Users/sven/.rvm/gems/ruby-1.8.7-p302/bin/god -c config/resque.god -D
I might be a little late to the party here but we had the same issue on our side. We were using
which caused the multiple processes. According to our sysops guy this is due to the usage of rvm do which we ended up replacing with
This allowed god to work as expected without the need to specify the pid file.
You need to tell god to use pid file generated by rescue and set pid file
env will tell rescue to write pid file, and pid_file will tell god to use it
also as svenfuchs noted it should be enough to set only proper env:
where /home/travis/.god/pids is the default pids directory