pod hangs in Pending state

2019-08-06 14:32发布

问题:

I have a kubernetes deployment in which I am trying to run 5 docker containers inside a single pod on a single node. The containers hang in "Pending" state and are never scheduled. I do not mind running more than 1 pod but I'd like to keep the number of nodes down. I have assumed 1 node with 1 CPU and 1.7G RAM will be enough for the 5 containers and I have attempted to split the workload across.

Initially I came to the conclusion that I have insufficient resources. I enabled autoscaling of nodes which produced the following (see kubectl describe pod command):

pod didn't trigger scale-up (it wouldn't fit if a new node is added)

Anyway, each docker container has a simple command which runs a fairly simple app. Ideally I wouldn't like to have to deal with setting CPU and RAM allocation of resources but even setting the CPU/mem limits within bounds so they don't add up to > 1, I still get (see kubectl describe po/test-529945953-gh6cl) I get this:

No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).

Below are various commands that show the state. Any help on what I'm doing wrong will be appreciated.

kubectl get all

user_s@testing-11111:~/gce$ kubectl get all
NAME                          READY     STATUS    RESTARTS   AGE
po/test-529945953-gh6cl   0/5       Pending   0          34m

NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
svc/kubernetes   10.7.240.1   <none>        443/TCP   19d

NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/test   1         1         1            0           34m

NAME                    DESIRED   CURRENT   READY     AGE
rs/test-529945953   1         1         0         34m
user_s@testing-11111:~/gce$

kubectl describe po/test-529945953-gh6cl

user_s@testing-11111:~/gce$ kubectl describe po/test-529945953-gh6cl
Name:           test-529945953-gh6cl
Namespace:      default
Node:           <none>
Labels:         app=test
                pod-template-hash=529945953
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"test-529945953","uid":"c6e889cb-a2a0-11e7-ac18-42010a9a001a"...
Status:         Pending
IP:
Created By:     ReplicaSet/test-529945953
Controlled By:  ReplicaSet/test-529945953
Containers:
  container-test2-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      test2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  container-kraken-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      arg2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  container-gdax-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      arg2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  container-bittrex-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      arg2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  cloudsql-proxy:
    Image:      gcr.io/cloudsql-docker/gce-proxy:1.09
    Port:       <none>
    Command:
      /cloud_sql_proxy
      --dir=/cloudsql
      -instances=testing-11111:europe-west2:testology=tcp:5432
      -credential_file=/secrets/cloudsql/credentials.json
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:              100m
      memory:           375Mi
    Environment:        <none>
    Mounts:
      /cloudsql from cloudsql (rw)
      /etc/ssl/certs from ssl-certs (rw)
      /secrets/cloudsql from cloudsql-instance-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
Conditions:
  Type          Status
  PodScheduled  False
Volumes:
  cloudsql-instance-credentials:
    Type:       Secret (a volume populated by a Secret)
    SecretName: cloudsql-instance-credentials
    Optional:   false
  ssl-certs:
    Type:       HostPath (bare host directory volume)
    Path:       /etc/ssl/certs
  cloudsql:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-b2mxc:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-b2mxc
    Optional:   false
QoS Class:      Burstable
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady:NoExecute for 300s
                node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  FirstSeen     LastSeen        Count   From                    SubObjectPath   Type            Reason                  Message
  ---------     --------        -----   ----                    -------------   --------        ------                  -------
  27m           17m             44      default-scheduler                       Warning         FailedScheduling        No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2).
  26m           8s              150     cluster-autoscaler                      Normal          NotTriggerScaleUp       pod didn't trigger scale-up (it wouldn't fit if a new node is added)
  16m           2s              63      default-scheduler                       Warning         FailedScheduling        No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).
user_s@testing-11111:~/gce$

> Blockquote

kubectl get nodes

user_s@testing-11111:~/gce$ kubectl get nodes
NAME                                      STATUS    AGE       VERSION
gke-test-default-pool-abdf83f7-p4zw   Ready     9h        v1.6.7

kubectl get pods

user_s@testing-11111:~/gce$ kubectl get pods
NAME                       READY     STATUS    RESTARTS   AGE
test-529945953-gh6cl   0/5       Pending   0          38m

kubectl describe nodes

user_s@testing-11111:~/gce$ kubectl describe nodes
Name:                   gke-test-default-pool-abdf83f7-p4zw
Role:
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/fluentd-ds-ready=true
                        beta.kubernetes.io/instance-type=g1-small
                        beta.kubernetes.io/os=linux
                        cloud.google.com/gke-nodepool=default-pool
                        failure-domain.beta.kubernetes.io/region=europe-west2
                        failure-domain.beta.kubernetes.io/zone=europe-west2-c
                        kubernetes.io/hostname=gke-test-default-pool-abdf83f7-p4zw
Annotations:            node.alpha.kubernetes.io/ttl=0
                        volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:                 <none>
CreationTimestamp:      Tue, 26 Sep 2017 02:05:45 +0100
Conditions:
  Type                  Status  LastHeartbeatTime                       LastTransitionTime                      Reason                          Message
  ----                  ------  -----------------                       ------------------                      ------                          -------
  NetworkUnavailable    False   Tue, 26 Sep 2017 02:06:05 +0100         Tue, 26 Sep 2017 02:06:05 +0100         RouteCreated                    RouteController created a route
  OutOfDisk             False   Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KubeletHasSufficientDisk        kubelet has sufficient disk space available
  MemoryPressure        False   Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure          False   Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KubeletHasNoDiskPressure        kubelet has no disk pressure
  Ready                 True    Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:06:05 +0100         KubeletReady                    kubelet is posting ready status. AppArmor enabled
  KernelDeadlock        False   Tue, 26 Sep 2017 11:33:12 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KernelHasNoDeadlock             kernel has no deadlock
Addresses:
  InternalIP:   10.154.0.2
  ExternalIP:   35.197.217.1
  Hostname:     gke-test-default-pool-abdf83f7-p4zw
Capacity:
 cpu:           1
 memory:        1742968Ki
 pods:          110
Allocatable:
 cpu:           1
 memory:        1742968Ki
 pods:          110
System Info:
 Machine ID:                    e6119abf844c564193495c64fd9bd341
 System UUID:                   E6119ABF-844C-5641-9349-5C64FD9BD341
 Boot ID:                       1c2f2ea0-1f5b-4c90-9e14-d1d9d7b75221
 Kernel Version:                4.4.52+
 OS Image:                      Container-Optimized OS from Google
 Operating System:              linux
 Architecture:                  amd64
 Container Runtime Version:     docker://1.11.2
 Kubelet Version:               v1.6.7
 Kube-Proxy Version:            v1.6.7
PodCIDR:                        10.4.1.0/24
ExternalID:                     6073438913956157854
Non-terminated Pods:            (7 in total)
  Namespace                     Name                                                            CPU Requests    CPU Limits      Memory Requests Memory Limits
  ---------                     ----                                                            ------------    ----------      --------------- -------------
  kube-system                   fluentd-gcp-v2.0-k565g                                          100m (10%)      0 (0%)          200Mi (11%)     300Mi (17%)
  kube-system                   heapster-v1.3.0-3440173064-1ztvw                                138m (13%)      138m (13%)      301456Ki (17%)  301456Ki (17%)
  kube-system                   kube-dns-1829567597-gdz52                                       260m (26%)      0 (0%)          110Mi (6%)      170Mi (9%)
  kube-system                   kube-dns-autoscaler-2501648610-7q9dd                            20m (2%)        0 (0%)          10Mi (0%)       0 (0%)
  kube-system                   kube-proxy-gke-test-default-pool-abdf83f7-p4zw              100m (10%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kubernetes-dashboard-490794276-25hmn                            100m (10%)      100m (10%)      50Mi (2%)       50Mi (2%)
  kube-system                   l7-default-backend-3574702981-flqck                             10m (1%)        10m (1%)        20Mi (1%)       20Mi (1%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits      Memory Requests Memory Limits
  ------------  ----------      --------------- -------------
  728m (72%)    248m (24%)      700816Ki (40%)  854416Ki (49%)
Events:         <none>

回答1:

As you can see in the output of your kubectl describe nodes command under Allocated resources:, there is 728m (72%) CPU and 700816Ki (40%) Memory already requested by Pods running in the kube-system namespace on the node. The sum of resource requests of your test Pod both exceeds the remaining CPU and Memory available on your node, as you can see under Events of your kubectl describe po/[…] command.

If you want to keep all containers in a single pod, you need to reduce the resource requests of your containers or run them on a node with more CPU and Memory. The better solution would be to split your application in multiple pods, this enables distribution over multiple nodes.