可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am deploying a simple hello world nginx container with marathon, and everything seems to work well, except that I have 6 containers that will not deregister from consul. docker ps shows none of the containers are running.

I tried using the /v1/catalog/deregister endpoint to deregister the services, but they keep coming back. I then killed the registrator container, and tried deregistering again. They came back.

I am running registrator with

docker run -d --name agent-registrator -v /var/run/docker.sock:/tmp/docker.sock --net=host gliderlabs/registrator consul://127.0.0.1:8500 -deregister-on-success -cleanup

There is 1 consul agent running.

Restarting the machine (this is a single node installation on a local vm) does not make the services go away.

How do I make these containers go away?

回答1:

Using the http api for removing services is another much nicer solution. I just figured out how to manually remove services before I figured out how to use the https api.

To delete a service with the http api use the following command: curl -v -X PUT http://<consul_ip_address>:8500/v1/agent/service/deregister/<ServiceID>

Note that your is a combination of three things: the IP address of host machine the container is running on, the name of the container, and the inner port of the container (i.e. 80 for apache, 3000 for node js, 8000 for django, ect) all separated by colins :

Heres an example of what that would actually look like: curl -v -X PUT http://1.2.3.4:8500/v1/agent/service/deregister/192.168.1.1:sharp_apple:80

If you want an easy way to get the ServiceID then just curl the service that contains a zombie: curl -s http://<consul_ip_address>:8500/v1/catalog/service/<your_services_name>

Heres a real example for a service called someapp that will return all the services under it: curl -s http://1.2.3.4:8500/v1/catalog/service/someapp

回答2:

In a Consul Cluster the Agents are considered authoritative. If you use the the HTTP Api /v1/catalog/deregister endpoint to deregister services, it will keep coming back as long as other Agents have known about that service. It's the way that the Gossip protocol works.

If you want Services to go away immediately you need to deregister the host agent properly by issuing a consul leave before killing the service on the node.

回答3:

This is one of the problems with Consul and registrator, if the service doesn't have a check associated with it, the service will stick around until it's de-registered and be "active". So it's good practice to have services register a health check as well. That way they will at least be critical if registrator messes up and forgets to de-register the service (which I see happens a lot). Alex's answer, of erasing the files in consul's data/services directory (then consul reload) definitely works to erase the service, but registrator will re-add them, if the containers are still around and running. Apparently the newer registrator versions are better at cleanup, but I've had mixed success. Now I don't use registrator at all, since it doesn't add health checks. I use nomad to run my containers (also from hashicorp) and it will create the service AND create the health check, and does a great job of cleaning up after itself.

回答4:

Don't use catalog, instead of using agent, the reason is catalog is maintained by agents, it will be resync-back by agent even if you remove it from catalog, remove zombie services shell script:

leader="$(curl http://ONE-OF-YOUR-CLUSTER:8500/v1/status/leader | sed 

's/:8300//' | sed 's/"//g')"
while :
do
serviceID="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].ServiceID' | sed 's/"//g')"
node="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].Node' | sed 's/"//g')"
echo "serviceID=$serviceID, node=$node"
size=${#serviceID}
echo "size=$size"
if [ $size -ge 7 ]; then
curl --request PUT http://$node:8500/v1/agent/service/deregister/$serviceID
else
break
fi
done
curl http://$leader:8500/v1/health/state/critical

json parser jq is used for field retrieving

回答5:

Try to switch to v5

docker run -d --name agent-registrator -v /var/run/docker.sock:/tmp/docker.sock gliderlabs/registrator:v5 -internal consul://172.16.0.4:8500

回答6:

Here is how you can absolutely delete all the zombie services: Go into your consul server, find the location of the json files containing the zombies and delete them.

For example I am running consul in a container:

docker run --restart=unless-stopped -d -h consul0 --name consul0 -v /mnt:/data \
    -p $(hostname -i):8300:8300 \
    -p $(hostname -i):8301:8301 \
    -p $(hostname -i):8301:8301/udp \
    -p $(hostname -i):8302:8302 \
    -p $(hostname -i):8302:8302/udp \
    -p $(hostname -i):8400:8400 \
    -p $(hostname -i):8500:8500 \
    -p $(ifconfig docker0 | awk '/\<inet\>/ { print $2}' | cut -d: -f2):53:53/udp \
    progrium/consul -server -advertise $(hostname -i) -bootstrap-expect 3

Notice the flag -v /mnt:/data this is where all the data consul is storing is located. For me it was located in /mnt. Under this directory you will find several other directories.

config raft serf services tmp

Go into services and you will see the files that contain the json info of your services, find any ones that contains the info of zombies and delete them. Then restart consul. Then repeat for each server in your cluster that has zombies on it.