Kubernetes coredns pods stuck in Pending status. C

2019-07-29 23:11发布

问题:

I am building a Kubernetes cluster following this tuto , and I have troubles to access the kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.

I start my master, by running the following commands :

> kubeadm reset 
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later

> kubectl apply -f https://git.io/weave-kube/

> kubectl -n kube-system get pod
NAME                                READY   STATUS  RESTARTS    AGE
coredns-fb8b8dccf-kb2zq             0/1     Pending 0           2m46s
coredns-fb8b8dccf-nnc5n             0/1     Pending 0           2m46s
etcd-kubemaster                     1/1     Running 0           93s
kube-apiserver-kubemaster           1/1     Running 0           93s
kube-controller-manager-kubemaster  1/1     Running 0           113s
kube-proxy-lxhvs                    1/1     Running 0           2m46s
kube-scheduler-kubemaster           1/1     Running 0           93s

Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :

> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq

I can see in the Events part the following Warning :

Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.

Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :

> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
    --discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]

> ssh [USER]@[WORKER_IP] 'bash' < join.sh

This node has joined the cluster.

On the master, I check that the node is connected:

> kubectl get nodes 
NAME        STATUS      ROLES   AGE     VERSION
kubemaster  NotReady    master  13m     v1.14.1
kubeslave1  NotReady    <none>  31s     v1.14.1

And I check my pods :

> kubectl -n kube-system get pod
NAME                                READY   STATUS              RESTARTS    AGE
coredns-fb8b8dccf-kb2zq             0/1     Pending             0           14m
coredns-fb8b8dccf-nnc5n             0/1     Pending             0           14m
etcd-kubemaster                     1/1     Running             0           13m
kube-apiserver-kubemaster           1/1     Running             0           13m
kube-controller-manager-kubemaster  1/1     Running             0           13m
kube-proxy-lxhvs                    1/1     Running             0           14m
kube-proxy-xllx4                    0/1     ContainerCreating   0           2m16s
kube-scheduler-kubemaster           1/1     Running             0           13m

We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.

And when I am doing a describe again :

kubectl -n kube-system describe pod kube-proxy-xllx4

I can see in the Events part multiple identical Warnings :

Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused

Here are my repositories :

docker image ls
REPOSITORY                          TAG     
k8s.gcr.io/kube-proxy               v1.14.1 
k8s.gcr.io/kube-apiserver           v1.14.1 
k8s.gcr.io/kube-controller-manager  v1.14.1 
k8s.gcr.io/kube-scheduler           v1.14.1 
k8s.gcr.io/coredns                  1.3.1   
k8s.gcr.io/etcd                     3.3.10  
k8s.gcr.io/pause                    3.1 

And so, for the dashboard part, I tried to start it with the command

> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml

But the dashboard pod is stuck in Pending state.

kubectl -n kube-system get pod
NAME                                    READY   STATUS              RESTARTS    AGE
coredns-fb8b8dccf-kb2zq                 0/1     Pending             0           40m
coredns-fb8b8dccf-nnc5n                 0/1     Pending             0           40m
etcd-kubemaster                         1/1     Running             0           38m
kube-apiserver-kubemaster               1/1     Running             0           38m
kube-controller-manager-kubemaster      1/1     Running             0           39m
kube-proxy-lxhvs                        1/1     Running             0           40m
kube-proxy-xllx4                        0/1     ContainerCreating   0           27m
kube-scheduler-kubemaster               1/1     Running             0           38m
kubernetes-dashboard-5f7b999d65-qn8qn   1/1     Pending             0           8s

So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.

I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.

回答1:

There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.

Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.

EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:

The network must be deployed before any applications. Also, CoreDNS will not start up before a network is installed. kubeadm only supports Container Network Interface (CNI) based networks (and does not support kubenet).



回答2:

Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.

In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:

kubectl describe node | grep -i taints

Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.

Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 10
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        ports:
        - containerPort: 80
          name: http
      tolerations:
      - effect: NoExecute       #NoSchedule, PreferNoSchedule
        key: node
        operator: Equal
        value: not-ready
        tolerationSeconds: 3600

As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.

Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:

kubectl taint node NODE key-

Hope it makes sense. Just add this section to your deployment, and it will work.



回答3:

Set up the flannel network tool.

Running commands:

$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f 

https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml



标签: kubernetes