I'm using docker on quite a lot of servers right now but sometimes some of the containers I use crash due to heavy load. I was thinking on adding a cron that checks every minute of the container is running or not but I didn't find any satisfactory method on doing that.
I'm starting the container with a cidfile that saves the id of the running container. If the container crashes the cidfile stays there with the id inside and I was just wondering how do you guys make sure a container is running or not and respawn it in case it went down. Should I just parse the output of docker ps -a
or is there more elegant solution?
Since docker version 1.2.0 there's a new switch for the run
command called --restart
which should make any external tools or monitoring obsolete. Since the documentation isn't properly explaining the feature at the time of this writing, read the announcing blog post for details.
The answer is somewhat buried levels deep but I found out multiple ways of doing it starting with the most elegant:
Name your container when running it so you can attach to it's process logging and couple that with a process monitor such as upstart/systemd/supervisord
docker run -itd --name=test ubuntu
upstart example (/etc/init/test.conf
):
description "My test container"
start on filesystem and started docker
stop on runlevel [!2345]
respawn
script
/usr/bin/docker start -a test
end script
Less elegant: watch for changes in cidfile contents
docker run -itd --name=test --cidfile=/tmp/cidfile_path ubuntu
An hourly cron maybe...
#!/bin/bash
RUNNING=$(docker ps -a --no-trunc | awk '/test/ && /Up/' | awk '{print $1}')
CIDFILE=$(cat /tmp/cidfile_path)
if [ "$RUNNING" != "$CIDFILE" ]
then
# do something wise
fi
Similar to the above you can see if a given container is running...in a loop/cron/whatever
#!/bin/bash
RUNNING=$(docker inspect --format '{{.State.Running}}' test)
if [ "$RUNNING" == false ]
then
# do something wise
fi
You can combine commands to do whatever checking script you like, I went with upstart
because it suits my situation but these examples could be used for all possible scenarios should you need more control.