Kubernetes, can't reach other node services

2019-04-08 16:54发布

问题:

I'm playing with Kubernetes inside 3 VirtualBox VMs with CentOS 7, 1 master and 2 minions. Unfortunately installation manuals say something like every service will be accessible from every node, every pod will see all other pods, but I don't see this happening. I can access the service only from that node where the specific pod runs. Please help to find out what am I missing, I'm very new to Kubernetes.

Every VM has 2 adapters: NAT and Host-only. Second one has IPs 10.0.13.101-254.

  • Master-1: 10.0.13.104
  • Minion-1: 10.0.13.105
  • Minion-2: 10.0.13.106

Get all pods from master:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY     STATUS    RESTARTS   AGE
default       busybox                            1/1       Running   4          37m
default       nginx-demo-2867147694-f6f9m        1/1       Running   1          52m
default       nginx-demo2-2631277934-v4ggr       1/1       Running   0          5s
kube-system   etcd-master-1                      1/1       Running   1          1h
kube-system   kube-apiserver-master-1            1/1       Running   1          1h
kube-system   kube-controller-manager-master-1   1/1       Running   1          1h
kube-system   kube-dns-2425271678-kgb7k          3/3       Running   3          1h
kube-system   kube-flannel-ds-pwsq4              2/2       Running   4          56m
kube-system   kube-flannel-ds-qswt7              2/2       Running   4          1h
kube-system   kube-flannel-ds-z0g8c              2/2       Running   12         56m
kube-system   kube-proxy-0lfw0                   1/1       Running   2          56m
kube-system   kube-proxy-6263z                   1/1       Running   2          56m
kube-system   kube-proxy-b8hc3                   1/1       Running   1          1h
kube-system   kube-scheduler-master-1            1/1       Running   1          1h

Get all services:

$ kubectl get services
NAME          CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes    10.96.0.1       <none>        443/TCP   1h
nginx-demo    10.104.34.229   <none>        80/TCP    51m
nginx-demo2   10.102.145.89   <none>        80/TCP    3s

Get Nginx pods IP info:

$ kubectl get pod nginx-demo-2867147694-f6f9m -o json | grep IP
        "hostIP": "10.0.13.105",
        "podIP": "10.244.1.58",

$ kubectl get pod nginx-demo2-2631277934-v4ggr -o json | grep IP
        "hostIP": "10.0.13.106",
        "podIP": "10.244.2.14",

As you see - one Nginx example is on the first minion, and the second example is on the second minion.

The problem is - I can access nginx-demo from node 10.0.13.105 only (with Pod IP and Service IP), with curl:

curl 10.244.1.58:80
curl 10.104.34.229:80

, and nginx-demo2 from 10.0.13.106 only, not vice versa and not from master-node. Busybox is on node 10.0.13.105, so it can reach nginx-demo, but not nginx-demo2.

How do I make access to the service node-independently? Is flannel misconfigured?

Routing table on master:

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.2.2        0.0.0.0         UG    100    0        0 enp0s3
10.0.2.0        0.0.0.0         255.255.255.0   U     100    0        0 enp0s3
10.0.13.0       0.0.0.0         255.255.255.0   U     100    0        0 enp0s8
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.0.0      0.0.0.0         255.255.0.0     U     0      0        0 flannel.1
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Routing table on minion-1:

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.2.2        0.0.0.0         UG    100    0        0 enp0s3
10.0.2.0        0.0.0.0         255.255.255.0   U     100    0        0 enp0s3
10.0.13.0       0.0.0.0         255.255.255.0   U     100    0        0 enp0s8
10.244.0.0      0.0.0.0         255.255.0.0     U     0      0        0 flannel.1
10.244.1.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Maybe default gateway is a problem (NAT adapter)? Another problem - from Busybox container I try to do services DNS resolving and it doesn't work too:

$ kubectl run -i --tty busybox --image=busybox --generator="run-pod/v1"
If you don't see a command prompt, try pressing enter.
/ # 
/ # nslookup nginx-demo
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'nginx-demo'
/ # 
/ # nslookup nginx-demo.default.svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'nginx-demo.default.svc.cluster.local'

Decreasing Guest OS security was done:

setenforce 0
systemctl stop firewalld

Feel free to ask more info if you need.


Addional info

kube-dns logs:

$ kubectl -n kube-system logs kube-dns-2425271678-kgb7k kubedns
I0919 07:48:45.000397       1 dns.go:48] version: 1.14.3-4-gee838f6
I0919 07:48:45.114060       1 server.go:70] Using configuration read from directory: /kube-dns-config with period 10s
I0919 07:48:45.114129       1 server.go:113] FLAG: --alsologtostderr="false"
I0919 07:48:45.114144       1 server.go:113] FLAG: --config-dir="/kube-dns-config"
I0919 07:48:45.114155       1 server.go:113] FLAG: --config-map=""
I0919 07:48:45.114162       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0919 07:48:45.114169       1 server.go:113] FLAG: --config-period="10s"
I0919 07:48:45.114179       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0919 07:48:45.114186       1 server.go:113] FLAG: --dns-port="10053"
I0919 07:48:45.114196       1 server.go:113] FLAG: --domain="cluster.local."
I0919 07:48:45.114206       1 server.go:113] FLAG: --federations=""
I0919 07:48:45.114215       1 server.go:113] FLAG: --healthz-port="8081"
I0919 07:48:45.114223       1 server.go:113] FLAG: --initial-sync-timeout="1m0s"
I0919 07:48:45.114230       1 server.go:113] FLAG: --kube-master-url=""
I0919 07:48:45.114238       1 server.go:113] FLAG: --kubecfg-file=""
I0919 07:48:45.114245       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0919 07:48:45.114256       1 server.go:113] FLAG: --log-dir=""
I0919 07:48:45.114264       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0919 07:48:45.114271       1 server.go:113] FLAG: --logtostderr="true"
I0919 07:48:45.114278       1 server.go:113] FLAG: --nameservers=""
I0919 07:48:45.114285       1 server.go:113] FLAG: --stderrthreshold="2"
I0919 07:48:45.114292       1 server.go:113] FLAG: --v="2"
I0919 07:48:45.114299       1 server.go:113] FLAG: --version="false"
I0919 07:48:45.114310       1 server.go:113] FLAG: --vmodule=""
I0919 07:48:45.116894       1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0919 07:48:45.117296       1 server.go:198] Skydns metrics enabled (/metrics:10055)
I0919 07:48:45.117329       1 dns.go:147] Starting endpointsController
I0919 07:48:45.117336       1 dns.go:150] Starting serviceController
I0919 07:48:45.117702       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0919 07:48:45.117716       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0919 07:48:45.620177       1 dns.go:171] Initialized services and endpoints from apiserver
I0919 07:48:45.620217       1 server.go:129] Setting up Healthz Handler (/readiness)
I0919 07:48:45.620229       1 server.go:134] Setting up cache handler (/cache)
I0919 07:48:45.620238       1 server.go:120] Status HTTP port 8081



$ kubectl -n kube-system logs kube-dns-2425271678-kgb7k dnsmasq
I0919 07:48:48.466499       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0919 07:48:48.478353       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0919 07:48:48.697877       1 nanny.go:111] 
W0919 07:48:48.697903       1 nanny.go:112] Got EOF from stdout
I0919 07:48:48.697925       1 nanny.go:108] dnsmasq[10]: started, version 2.76 cachesize 1000
I0919 07:48:48.697937       1 nanny.go:108] dnsmasq[10]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0919 07:48:48.697943       1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0919 07:48:48.697947       1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0919 07:48:48.697950       1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0919 07:48:48.697955       1 nanny.go:108] dnsmasq[10]: reading /etc/resolv.conf
I0919 07:48:48.697959       1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0919 07:48:48.697962       1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0919 07:48:48.697965       1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0919 07:48:48.697968       1 nanny.go:108] dnsmasq[10]: using nameserver 85.254.193.137#53
I0919 07:48:48.697971       1 nanny.go:108] dnsmasq[10]: using nameserver 92.240.64.23#53
I0919 07:48:48.697975       1 nanny.go:108] dnsmasq[10]: read /etc/hosts - 7 addresses



$ kubectl -n kube-system logs kube-dns-2425271678-kgb7k sidecar
ERROR: logging before flag.Parse: I0919 07:48:49.990468       1 main.go:48] Version v1.14.3-4-gee838f6
ERROR: logging before flag.Parse: I0919 07:48:49.994335       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
ERROR: logging before flag.Parse: I0919 07:48:49.994369       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: I0919 07:48:49.994435       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}

kube-flannel logs from one pod. but is similar to the others:

$ kubectl -n kube-system logs kube-flannel-ds-674mx kube-flannel
I0919 08:07:41.577954       1 main.go:446] Determining IP address of default interface
I0919 08:07:41.579363       1 main.go:459] Using interface with name enp0s3 and address 10.0.2.15
I0919 08:07:41.579408       1 main.go:476] Defaulting external address to interface address (10.0.2.15)
I0919 08:07:41.600985       1 kube.go:130] Waiting 10m0s for node controller to sync
I0919 08:07:41.601032       1 kube.go:283] Starting kube subnet manager
I0919 08:07:42.601553       1 kube.go:137] Node controller sync successful
I0919 08:07:42.601959       1 main.go:226] Created subnet manager: Kubernetes Subnet Manager - minion-1
I0919 08:07:42.601966       1 main.go:229] Installing signal handlers
I0919 08:07:42.602036       1 main.go:330] Found network config - Backend type: vxlan
I0919 08:07:42.606970       1 ipmasq.go:51] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I0919 08:07:42.608380       1 ipmasq.go:51] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I0919 08:07:42.609579       1 ipmasq.go:51] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.1.0/24 -j RETURN
I0919 08:07:42.611257       1 ipmasq.go:51] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I0919 08:07:42.612595       1 main.go:279] Wrote subnet file to /run/flannel/subnet.env
I0919 08:07:42.612606       1 main.go:284] Finished starting backend.
I0919 08:07:42.612638       1 vxlan_network.go:56] Watching for L3 misses
I0919 08:07:42.612651       1 vxlan_network.go:64] Watching for new subnet leases


$ kubectl -n kube-system logs kube-flannel-ds-674mx install-cni
+ cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf
+ true
+ sleep 3600
+ true
+ sleep 3600

I've added some more services and exposed with type NodePort, this is what I get when scanning ports from host machine:

# nmap 10.0.13.104 -p1-50000

Starting Nmap 7.60 ( https://nmap.org ) at 2017-09-19 12:20 EEST
Nmap scan report for 10.0.13.104
Host is up (0.0014s latency).
Not shown: 49992 closed ports
PORT      STATE    SERVICE
22/tcp    open     ssh
6443/tcp  open     sun-sr-https
10250/tcp open     unknown
10255/tcp open     unknown
10256/tcp open     unknown
30029/tcp filtered unknown
31844/tcp filtered unknown
32619/tcp filtered unknown
MAC Address: 08:00:27:90:26:1C (Oracle VirtualBox virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.96 seconds



# nmap 10.0.13.105 -p1-50000

Starting Nmap 7.60 ( https://nmap.org ) at 2017-09-19 12:20 EEST
Nmap scan report for 10.0.13.105
Host is up (0.00040s latency).
Not shown: 49993 closed ports
PORT      STATE    SERVICE
22/tcp    open     ssh
10250/tcp open     unknown
10255/tcp open     unknown
10256/tcp open     unknown
30029/tcp open     unknown
31844/tcp open     unknown
32619/tcp filtered unknown
MAC Address: 08:00:27:F8:E3:71 (Oracle VirtualBox virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.87 seconds



# nmap 10.0.13.106 -p1-50000

Starting Nmap 7.60 ( https://nmap.org ) at 2017-09-19 12:21 EEST
Nmap scan report for 10.0.13.106
Host is up (0.00059s latency).
Not shown: 49993 closed ports
PORT      STATE    SERVICE
22/tcp    open     ssh
10250/tcp open     unknown
10255/tcp open     unknown
10256/tcp open     unknown
30029/tcp filtered unknown
31844/tcp filtered unknown
32619/tcp open     unknown
MAC Address: 08:00:27:D9:33:32 (Oracle VirtualBox virtual NIC)

Nmap done: 1 IP address (1 host up) scanned in 1.92 seconds

If we take the latest service on port 32619 - it exists everywhere, but is Open only on related node, on the others it's filtered.

tcpdump info on Minion-1

Connection from Host to Minion-1 with curl 10.0.13.105:30572:

# tcpdump -ni enp0s8 tcp or icmp and not port 22 and not host 10.0.13.104
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes

13:11:39.043874 IP 10.0.13.1.41132 > 10.0.13.105.30572: Flags [S], seq 657506957, win 29200, options [mss 1460,sackOK,TS val 504213496 ecr 0,nop,wscale 7], length 0
13:11:39.045218 IP 10.0.13.105 > 10.0.13.1: ICMP time exceeded in-transit, length 68

on flannel.1 interface:

# tcpdump -ni flannel.1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes


13:11:49.499148 IP 10.244.1.0.41134 > 10.244.2.38.http: Flags [S], seq 2858453268, win 29200, options [mss 1460,sackOK,TS val 504216633 ecr 0,nop,wscale 7], length 0
13:11:49.499074 IP 10.244.1.0.41134 > 10.244.2.38.http: Flags [S], seq 2858453268, win 29200, options [mss 1460,sackOK,TS val 504216633 ecr 0,nop,wscale 7], length 0
13:11:49.499239 IP 10.244.1.0.41134 > 10.244.2.38.http: Flags [S], seq 2858453268, win 29200, options [mss 1460,sackOK,TS val 504216633 ecr 0,nop,wscale 7], length 0
13:11:49.499074 IP 10.244.1.0.41134 > 10.244.2.38.http: Flags [S], seq 2858453268, win 29200, options [mss 1460,sackOK,TS val 504216633 ecr 0,nop,wscale 7], length 0
13:11:49.499247 IP 10.244.1.0.41134 > 10.244.2.38.http: Flags [S], seq 2858453268, win 29200, options [mss 1460,sackOK,TS val 504216633 ecr 0,nop,wscale 7], length 0

.. ICMP time exceeded in-transit error and SYN packets only, so no connection between pods networks, because curl 10.0.13.106:30572 works.

Minion-1 interfaces

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:35:72:ab brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 77769sec preferred_lft 77769sec
    inet6 fe80::772d:2128:6aaa:2355/64 scope link 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:f8:e3:71 brd ff:ff:ff:ff:ff:ff
    inet 10.0.13.105/24 brd 10.0.13.255 scope global dynamic enp0s8
       valid_lft 1089sec preferred_lft 1089sec
    inet6 fe80::1fe0:dba7:110d:d673/64 scope link 
       valid_lft forever preferred_lft forever
    inet6 fe80::f04f:5413:2d27:ab55/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 02:42:59:53:d7:fd brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.2/24 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN 
    link/ether fa:d3:3e:3e:77:19 brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::f8d3:3eff:fe3e:7719/64 scope link 
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP qlen 1000
    link/ether 0a:58:0a:f4:01:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::c4f9:96ff:fed8:8cb6/64 scope link 
       valid_lft forever preferred_lft forever
13: veth5e2971fe@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP 
    link/ether 1e:70:5d:6c:55:33 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::1c70:5dff:fe6c:5533/64 scope link 
       valid_lft forever preferred_lft forever
14: veth8f004069@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP 
    link/ether ca:39:96:59:e6:63 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::c839:96ff:fe59:e663/64 scope link 
       valid_lft forever preferred_lft forever
15: veth5742dc0d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP 
    link/ether c2:48:fa:41:5d:67 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::c048:faff:fe41:5d67/64 scope link 
       valid_lft forever preferred_lft forever

回答1:

It works either by disabling the firewall or by running below command.

I found this open bug in my search. Looks like this is related to docker >=1.13 and flannel

refer: https://github.com/coreos/flannel/issues/799



回答2:

I am not good at the network. we are in the same situation with you, we set up four virtual machines and one is for master, else are worker nodes. I tried to use nslookup some service using in some container in the pod, but it failed to lookup, stuck on getting response from kubernetes dns. I realize that the dns configuration or the network component is not right, thus look into the log of the canal(we use this CNI to establish the kubernete network), and find that it is initialized with the default interface which seems used by NAT but not the host-only one as below. We then rectify it, and it works now.

https://raw.githubusercontent.com/projectcalico/canal/master/k8s-install/1.7/canal.yaml

# The interface used by canal for host <-> host communication.

# If left blank, then the interface is chosen using the node's

# default route.

canal_iface: ""

Not sure which CNI you use, but hope this could help you to check.