The Load Balancing section in the swarm docs don't make it clear if the internal loadbalancer also does health checks, and if it removes nodes that aren't running the service anymore (because it got killed or the node got rebooted).
In the following case I've got a service with replicas 3, 1 instance running on each of the 3 nodes.
Manager:
[root@centosvm ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a593d485050a ddewaele/springboot.crud.sample:latest "sh -c 'java $JAVA_OP" 7 minutes ago Up 7 minutes springbootcrudsample.1.5syc6j4c8i3bnerdqq4e1yelm
Node1:
[root@node1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d3b3fbc0f2c5 ddewaele/springboot.crud.sample:latest "sh -c 'java $JAVA_OP" 4 minutes ago Up 4 minutes springbootcrudsample.3.7y1oyjyrifgkmxlr20oai5ppl
Node 2:
[root@node2 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ebca8f24ec3a ddewaele/springboot.crud.sample:latest "sh -c 'java $JAVA_OP" 7 minutes ago Up 7 minutes springbootcrudsample.2.4tqjad7od8ep047s55485na1t
Now, on node1, we kill the docker container. This node will be without a service (swarm will re-create it here after a couple of seconds to keep the replication=3 on the service)
[root@node1 ~]# docker kill d3b3fbc0f2c5
d3b3fbc0f2c5
Container gone
[root@node1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
New container up
[root@node1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b8c9a7a5cf97 ddewaele/springboot.crud.sample:latest "sh -c 'java $JAVA_OP" 11 seconds ago Up 9 seconds springbootcrudsample.3.9v4cnhi8dvq7n8afb2kvp28sk
In the output below however, when container d3b3fbc0f2c5
was killed, the ingress loadbalancer didn't detect this, and it was still sending traffic to the node (resulting in connection refused) ?
How should we handle such a scenario ? Do we still need an external loadbalancer for this scenario and how should we configure it ?
[root@centosvm ~]# while :; do curl http://localhost:8080/env/hostname ; echo "" ; sleep 1; done
{"hostname":"d3b3fbc0f2c5"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"d3b3fbc0f2c5"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"d3b3fbc0f2c5"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
{"hostname":"b8c9a7a5cf97"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
{"hostname":"b8c9a7a5cf97"}
As indicated by François Maturel, with a proper healthcheck in place, Docker Swarm will take into account the health status of the container to decide if it will route requests to it.
For Spring Boot applications that have enabled the default actuators, adding this to the Dockerfile is sufficient for a basic healthcheck. When the Spring Boot app is initialized and its health actuator is enabled, the following http request will return a valid http 200 response and the healthcheck will pass.
This will result in your docker containers being anble to reach a healthy status. When your docker container is started, the service running within it might still be initializing. To do proper load balancing and detect service health, Swarm needs to know when it is able to route reqeusts to a particular service instance (container on a node).
So when Swarm starts a service replica, it fires up a container, it will wait until the health status of the service is "healthy". As your container is starting, it will transition from "starting" :
to 'healthy'. Only then will the Swarm load balancer route requests to this endpoint.
@ddewaele is correct, so here's some more tidbits:
docker events
to see it kicking offexec
commands in each container with a healthcheck set for their Swarm service. You can also see there how it'll mark the task/container as healthy/unhealthy.