Consul on Docker Swarm with Spring Boot clients

2019-08-10 20:23发布

问题:

I'm having issues at the moment getting Consul running on Docker Swarm on Centos 7 (Docker version is 18.09.1, build 4c52b90), or more to the point, connecting to it from a worker node (either from a Consul agent or from a Spring Boot application trying to register with it).

I currently have just one manager and one worker node.

I have created an overlay network using the following command:

docker network create -d overlay smartdeploy_evo

On the manager I am deploying Consul using the following compose file called "docker-compose.consul.master.yml":

version: '3'
services:

  consul:
    image: consul:0.9.3
    hostname: "consul"
    volumes:
      - consul_data:/consul/data
    ports:
      - "8300-8302:8300-8302"
      - "8301-8302:8301-8302/udp"
      - "8400:8400"
      - "8500:8500"
      - "53:8600/udp"
    entrypoint:
      - consul
      - agent
      - -ui
      - -server
      - -bootstrap-expect=1
      - -bind={{ GetInterfaceIP "eth0" }}
      - -advertise={{ GetInterfaceIP "eth0" }}
      - -client=0.0.0.0
      - -data-dir=/consul/data
      - -disable-host-node-id
    healthcheck:
      test: ["CMD-SHELL", "consul info | awk '/health_score/{if ($$3 >=1) exit 1; else exit 0}'"]
    labels:
      - "evo-type=discovery"
    networks:
      - smartdeploy_evo
    deploy:
      placement:
        constraints:
          - node.role == manager

networks:
  smartdeploy_evo:
    external: true

volumes:
  consul_data:

If I deploy this to the Swarm using the following command:

docker stack deploy -c docker-compose.consul.master.yml consul

and then tail the output I get the following output showing a successful startup:

==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode.
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.9.3'
           Node ID: 'cb651a04-9aff-17f6-c5ab-2765fa7b0595'
         Node name: 'consul'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.255.0.243 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2019/01/16 15:49:29 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:10.255.0.243:8300 Address:10.255.0.243:8300}]
    2019/01/16 15:49:29 [INFO] raft: Node at 10.255.0.243:8300 [Follower] entering Follower state (Leader: "")
    2019/01/16 15:49:29 [INFO] serf: EventMemberJoin: consul.dc1 10.255.0.243
    2019/01/16 15:49:29 [INFO] serf: EventMemberJoin: consul 10.255.0.243
    2019/01/16 15:49:29 [INFO] consul: Adding LAN server consul (Addr: tcp/10.255.0.243:8300) (DC: dc1)
    2019/01/16 15:49:29 [INFO] consul: Handled member-join event for server "consul.dc1" in area "wan"
    2019/01/16 15:49:29 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2019/01/16 15:49:29 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2019/01/16 15:49:29 [INFO] agent: Started HTTP server on [::]:8500
    2019/01/16 15:49:37 [ERR] agent: failed to sync remote state: No cluster leader
    2019/01/16 15:49:39 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2019/01/16 15:49:39 [INFO] raft: Node at 10.255.0.243:8300 [Candidate] entering Candidate state in term 2
    2019/01/16 15:49:39 [INFO] raft: Election won. Tally: 1
    2019/01/16 15:49:39 [INFO] raft: Node at 10.255.0.243:8300 [Leader] entering Leader state
    2019/01/16 15:49:39 [INFO] consul: cluster leadership acquired
    2019/01/16 15:49:39 [INFO] consul: New leader elected: consul
    2019/01/16 15:49:39 [INFO] consul: member 'consul' joined, marking health alive
    2019/01/16 15:49:41 [INFO] agent: Synced node info

I then try and run my Spring Boot app on the Worker node using the following compose file called "docker-compose.services.2.yml" :

version: '3'
services:

  user-mgmt:
    image: smartdeployevo_usermgmt:latest
    ports:
      - "10500:10500"
    environment:
      - spring.cloud.consul.hostHealth=user-mgmt
      - spring.profiles.active=production
    labels:
      - "evo-type=service"
    networks:
      - smartdeploy_evo
    deploy:
      placement:
        constraints:
          - node.role == worker

networks:
  smartdeploy_evo:
    external: true

The Spring Boot web application tries to connect to the Consul server using the host alias "consul" on port 8500.

I deploy this service to the stack using the command:

docker stack deploy -c docker-compose.services.2.yml springboot

The Spring Boot app fails at startup with the following errors:

2019-01-16 15:51:38.753 ERROR [user-mgmt,,,] 1 --- [           main] o.s.c.c.c.ConsulPropertySourceLocator    : Fail fast is set and there was an error reading configuration from consul.
2019-01-16 15:51:38.766 ERROR [user-mgmt,,,] 1 --- [           main] o.s.boot.SpringApplication               : Application run failed

com.ecwid.consul.transport.TransportException: org.apache.http.conn.HttpHostConnectException: Connect to consul:8500 [consul/10.0.2.110] failed: Connection refused (Connection refused)

Note that the IP address that the worker think Consul is running on is 10.0.2.110

If I shell into the Consul container running on the container and run "ipconfig" I get the following ip addresses:

docker exec -it 714805722ace sh
ifconfig 

eth0      Link encap:Ethernet  HWaddr 02:42:0A:FF:00:F3
          inet addr:10.255.0.243  Bcast:10.255.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth1      Link encap:Ethernet  HWaddr 02:42:0A:00:02:6F
          inet addr:10.0.2.111  Bcast:10.0.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth2      Link encap:Ethernet  HWaddr 02:42:AC:12:00:03
          inet addr:172.18.0.3  Bcast:172.18.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:698 (698.0 B)  TX bytes:810 (810.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:80 errors:0 dropped:0 overruns:0 frame:0
          TX packets:80 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:38312 (37.4 KiB)  TX bytes:38312 (37.4 KiB)

Note that the IP Address for "eth1" on the same network that the Spring Boot application tries to connect to is

10.0.2.111

and not the IP address 10.0.2.110 that Spring Boot is using.

Q - Why is it that there is a discrepancy in the IP addresses, and how can I get my Spring Boot app to connect to the right IP address?

As additional info, if I run another container, such as an Apache HTTPD container, on the worker node and then shell into it and run

ping consul

PING consul (10.0.2.110): 56 data bytes
64 bytes from 10.0.2.110: seq=0 ttl=64 time=0.189 ms
64 bytes from 10.0.2.110: seq=1 ttl=64 time=0.167 ms

it also sees the 10.0.2.110 IP address and not the 10.0.2.111 IP address that consul is listening on.

Any help would be massively appreciated!

UPDATE:

As requested this is the output as requested on Manager and Worker nodes:

MANAGER:

docker network inspect smartdeploy_evo

[
    {
        "Name": "smartdeploy_evo",
        "Id": "7qyimelt9rjcaukfgd22tenxr",
        "Created": "2019-01-16T15:49:28.627116387Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.2.0/24",
                    "Gateway": "10.0.2.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "3e4ca1fd66b8d8012e7f4aca14b3a0d853761a92e83a95870dceef315de0f8dc": {
                "Name": "consul_consul.1.s0pbo1iuntiry2eh310wdsdqp",
                "EndpointID": "ee3bed0936fd2a9a4aa79e1efc23b60f1d45402a3b63a474743af5c2f8f5a479",
                "MacAddress": "02:42:0a:00:02:6f",
                "IPv4Address": "10.0.2.111/24",
                "IPv6Address": ""
            },
            "lb-smartdeploy_evo": {
                "Name": "smartdeploy_evo-endpoint",
                "EndpointID": "2492956b7df72f4c3f3c03e12b7f18d16601a1948b2d7fe202543d91bca75ec6",
                "MacAddress": "02:42:0a:00:02:70",
                "IPv4Address": "10.0.2.112/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4100"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "cdfe9dfef133",
                "IP": "192.221.173.234"
            },
            {
                "Name": "2a07b72b316f",
                "IP": "192.221.173.235"
            }
        ]
    }
]

WORKER:

docker network inspect smartdeploy_evo

[
    {
        "Name": "smartdeploy_evo",
        "Id": "7qyimelt9rjcaukfgd22tenxr",
        "Created": "2019-01-16T15:53:39.639835468Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.2.0/24",
                    "Gateway": "10.0.2.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "4943e458d4e08020ba2e49e375fd9371b75f8adb3835d249a453ab3cfbac4bc1": {
                "Name": "springboot_user-mgmt.1.p91mu1nmybt6k8zfezpe24a2f",
                "EndpointID": "2e04bfc51b50fedd2a766897832047800e55f555f91521767bd0a2647b2f3662",
                "MacAddress": "02:42:0a:00:02:90",
                "IPv4Address": "10.0.2.144/24",
                "IPv6Address": ""
            },
            "b50a7205440d0e7937348fe43d0d7bb0a8e66b7371c2c46dd3c7831248bdceb9": {
                "Name": "h_httpd.1.wymlfscwus5rk8ajf62qscb4r",
                "EndpointID": "d46331e82c16c4a4597470a7aa72685721fe063a06e11bc739623178031b24ea",
                "MacAddress": "02:42:0a:00:02:88",
                "IPv4Address": "10.0.2.136/24",
                "IPv6Address": ""
            },
            "lb-smartdeploy_evo": {
                "Name": "smartdeploy_evo-endpoint",
                "EndpointID": "9e3a507c4f3a58e0b74479c4a3878f858a816b5fd6905aa2fbfa89c2d1a8276a",
                "MacAddress": "02:42:0a:00:02:81",
                "IPv4Address": "10.0.2.129/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4100"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "cdfe9dfef133",
                "IP": "192.221.173.234"
            },
            {
                "Name": "2a07b72b316f",
                "IP": "192.221.173.235"
            }
        ]
    }
]

回答1:

I've been doing some research, and came across a similar issue from the Consul github site where user soakes suggests using gliderlabs/registrator as a bridge between the Consul manager and agents.

My version below is slightly different from his (mainly the names of containers to match my defaults in the Spring Boot code, and without SSL etc.) but without his majority input would have been stumped!

I created my Swarm overlay network as follows:

docker network create -d overlay --opt com.docker.network.swarm.name=smartdeploy_evo smartdeploy_evo

and then used the Compose file shown at the end of this post that I deploy from the Manager node as follows:

docker stack deploy -c consul.yml consul

with the file consul.yml defined as follows:

version: "3.4"

networks:
  smartdeploy_evo:
    external: true

volumes:
  consul:

services:

  consul:
    image: consul:0.9.3
    volumes:
      - consul:/consul
    ports:
      - target: 8500
        published: 8500
        mode: host
    networks:
      smartdeploy_evo:
        aliases:
          - consul.cluster
    environment:
      - 'CONSUL_LOCAL_CONFIG={ "skip_leave_on_interrupt": true,
      "data_dir":"/consul/data",
      "server":true }'
      - CONSUL_BIND_INTERFACE=eth0
    command: agent -ui -data-dir /consul/data -server -client 0.0.0.0 -bootstrap-expect=1 -retry-join consul.cluster
    deploy:
      endpoint_mode: dnsrr
      mode: global
      placement:
        constraints: [node.role ==  manager]

  consul_client:
    image: consul:0.9.3
    volumes:
      - consul:/consul
    networks:
      smartdeploy_evo:
        aliases:
          - consul.client.cluster
    environment:
      - 'CONSUL_LOCAL_CONFIG={ "skip_leave_on_interrupt": true,
      "data_dir":"/consul/data" }'
      - CONSUL_BIND_INTERFACE=eth0
    command: agent -ui -data-dir /consul/data -client 0.0.0.0 -retry-join consul.cluster
    deploy:
      endpoint_mode: dnsrr
      mode: global
      placement:
        constraints: [node.role !=  manager]

  consul_registrator:
    image: gliderlabs/registrator:master
    command: -internal consul://consul.cluster:8500
    volumes:
      - /var/run/docker.sock:/tmp/docker.sock
    networks:
      - smartdeploy_evo
    deploy:
      mode: global