How to fix weave-net CrashLoopBackOff for the seco

2019-02-09 09:07发布

问题:

I have got 2 VMs nodes. Both see each other either by hostname (through /etc/hosts) or by ip address. One has been provisioned with kubeadm as a master. Another as a worker node. Following the instructions (http://kubernetes.io/docs/getting-started-guides/kubeadm/) I have added weave-net. The list of pods looks like the following:

vagrant@vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS             RESTARTS   AGE
kube-system   etcd-vm-master                          1/1       Running            0          3m
kube-system   kube-apiserver-vm-master                1/1       Running            0          5m
kube-system   kube-controller-manager-vm-master       1/1       Running            0          4m
kube-system   kube-discovery-982812725-x2j8y          1/1       Running            0          4m
kube-system   kube-dns-2247936740-5pu0l               3/3       Running            0          4m
kube-system   kube-proxy-amd64-ail86                  1/1       Running            0          4m
kube-system   kube-proxy-amd64-oxxnc                  1/1       Running            0          2m
kube-system   kube-scheduler-vm-master                1/1       Running            0          4m
kube-system   kubernetes-dashboard-1655269645-0swts   1/1       Running            0          4m
kube-system   weave-net-7euqt                         2/2       Running            0          4m
kube-system   weave-net-baao6                         1/2       CrashLoopBackOff   2          2m

CrashLoopBackOff appears for each worker node connected. I have spent several ours playing with network interfaces, but it seems the network is fine. I have found similar question, where the answer advised to look into the logs and no follow up. So, here are the logs:

vagrant@vm-master:~$ kubectl logs weave-net-baao6 -c weave --namespace=kube-system
2016-10-05 10:48:01.350290 I | error contacting APIServer: Get https://100.64.0.1:443/api/v1/nodes: dial tcp 100.64.0.1:443: getsockopt: connection refused; trying with blank env vars
2016-10-05 10:48:01.351122 I | error contacting APIServer: Get http://localhost:8080/api: dial tcp [::1]:8080: getsockopt: connection refused
Failed to get peers

What I am doing wrong? Where to go from there?

回答1:

I ran in the same issue too. It seems weaver wants to connect to the Kubernetes Cluster IP address, which is virtual. Just run this to find the cluster ip: kubectl get svc. It should give you something like this:

$ kubectl get svc
NAME                     CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
kubernetes               100.64.0.1       <none>        443/TCP   2d

Weaver picks up this IP and tries to connect to it, but worker nodes does not know anything about it. Simple route will solve this issue. On all your worker nodes, execute:

route add 100.64.0.1 gw <your real master IP>


回答2:

this happens with a single node setup, too. I tried several things like reapplying the configuration and recreation, but the most stable way at the moment is to perform a full tear down (as described in docs) and put the cluster up again.

I use these scripts for relaunching the cluster:

down.sh

#!/bin/bash

systemctl stop kubelet;
docker rm -f -v $(docker ps -q);
find /var/lib/kubelet | xargs -n 1 findmnt -n -t tmpfs -o TARGET -T | uniq | xargs -r umount -v;
rm -r -f /etc/kubernetes /var/lib/kubelet /var/lib/etcd;

up.sh

#!/bin/bash

systemctl start kubelet
kubeadm init
# kubectl taint nodes --all dedicated- # single node!
kubectl create -f https://git.io/weave-kube

edit: I would also give other Pod networks a try, like Calico, if this is a weave related issue



回答3:

The most common causes for this may be: - presence of a firewall (e.g. firewalld on CentOS) - network configuration (e.g. default NAT interface on VirtualBox)

Currently kubeadm is still alpha, and this is one of the issues that has already been reported by many of the alpha testers. We are looking into fixing this by documenting the most common problems, such documentation is going to be ready closer to beta version.

Right there exists a VirtualBox+Vargant+Ansible for Ubunutu and CentOS reference implementation that provides solutions for firewall, SELinux and VirtualBox NAT issues.



回答4:

/usr/local/bin/weave reset

was the fix for me - Hope its useful - and yes make sure selinux is set to disabled and firewalld is not running (on redhat / centos) releases

kube-system weave-net-2vlvj 2/2 Running 3 11d
kube-system weave-net-42k6p 1/2 Running 3 11d
kube-system weave-net-wvsk5 2/2 Running 3 11d