Intermittent failure creating container on Kuberne

2019-07-31 09:36发布

For the past couple of days we have been experiencing an intermittent deployment failure when deploying (via Helm) to Kubernetes v1.11.2.

When it fails, kubectl describe <deployment> usually reports that the container failed to create:

Events:
Type    Reason     Age   From                   Message
----    ------     ----  ----                   -------
Normal  Scheduled  1s    default-scheduler      Successfully assigned default/pod-fc5c8d4b8-99npr to fh1-node04
Normal  Pulling    0s    kubelet, fh1-node04    pulling image "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
Error from server (BadRequest): container "pod" in pod "pod-fc5c8d4b8-99npr" is waiting to start: ContainerCreating

and the only issue we can find in the kubelet logs is:

58468 kubelet_pods.go:146] Mount cannot be satisfied for container "pod", because the volume is missing or the volume mounter is nil: {Name:default-token-q8k7w ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}
58468 kuberuntime_manager.go:733] container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount into container "pod"

It's intermittent which means it fails around once in every 20 or so deployments. Re-running the deployment works as expected.

The cluster and node health all look fine at the time of the deployment, so we are at a loss as to where to go from here. Looking for advice on where to start next on diagnosing the issue.

EDIT: As requested, the deployment file is generated via a Helm template and the output is shown below. For further information, the same Helm template is used for a lot of our services, but only this particular service has this intermittent issue:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: pod
  labels:
    app: pod
    chart: pod-0.1.0
    release: pod
    heritage: Tiller
    environment: integration
  annotations:
    kubernetes.io/change-cause: https://github.com/path_to_release
spec:
  replicas: 2
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: pod
      release: pod
      environment: integration
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: pod
        release: pod
        environment: integration
    spec:
      containers:
        - name: pod
          image: "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
          env:
            - name: VAULT_USERNAME
              valueFrom:
                secretKeyRef:
                  name: "pod-integration"
                  key: username
            - name: VAULT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: "pod-integration"
                  key: password
          imagePullPolicy: IfNotPresent
          command: ['mix', 'phx.server']

          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          envFrom:
          - configMapRef:
              name: pod

          livenessProbe:
            httpGet:
              path: /api/health
              port: http
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              path: /api/health
              port: http
            initialDelaySeconds: 10
          resources:
            limits:
              cpu: 750m
              memory: 200Mi
            requests:
              cpu: 500m
              memory: 150Mi

0条回答
登录 后发表回答