Unable to deduce why one imagePuller pod of JuPyte

2019-08-26 02:52发布

问题:

I am deploying jupyterhub on a kubernetes cluster. In the config.yaml file, I am specifying a registry and the image tag. While 3 pods are successfully created, one is not.

I could not find much content pertaining to jupyter-hub.

The helm chart can be found here(https://jupyterhub.github.io/helm-chart/jupyterhub-0.8.2.tgz).

My config for values.yaml is:

proxy:
  secretToken: "some token"
singleuser:
  image:
    name: acc_id.dkr.ecr.ap-south-1.amazonaws.com/demo
    tag: 12c
  lifecycleHooks:
    postStart:
      exec:
        command: ["/bin/sh", "-c", 'ipython profile create; cd ~/.ipython/profile_default/startup; echo ''run_id = "sample" ''> aviral.py']
  imagePullSecret:
    enabled: true
    registry: acc_id.dkr.ecr.ap-south-1.amazonaws.com
    username: aws
    email: aviral@abcd.com
    password: <my pw>

When I describe the pods:

➜  jupyterhub kubectl get pods -n jhub
NAME                       READY   STATUS                  RESTARTS   AGE
hook-image-awaiter-2xxfx   1/1     Running                 0          13m
hook-image-puller-4f9mk    1/1     Running                 0          13m
hook-image-puller-jshlk    1/1     Running                 0          13m
hook-image-puller-wj8r6    1/1     Running                 0          13m
hook-image-puller-wlgnh    0/1     Init:ImagePullBackOff   0          13m
hub-6766fc7586-zdf9n       1/1     Running                 0          35m
proxy-65f559ff89-md7r5     1/1     Running                 0          20h

As you can see, the pod named hook-image-puller-wlgnh is in Init:ImagePullBackOff mode.

While describing it, under the events section, I get:

Failed to pull image "acc_id.dkr.ecr.ap-south-1.amazonaws.com/demo:12c": [rpc error: code = Unknown desc = Error response from daemon: unauthorized: authentication required, rpc error: code = Canceled desc = context canceled]

However, the other 3 pods are running and they were able to pull the same image.

回答1:

This seems to be a know issue, as reported on GitHub Occasional ImagePullBackOff Errors when pulling large docker images #59376, this bug was not resolved and there seems to be several workarounds.

  • One is to recreate the pod that failed,
  • Another one is to increasing the image-pull-progress-deadline
  • Or removing the namespace in which it was deployed and deploying it again.

Also you can try what @P Ekambaram mentioned which would be running docker pull <IMAGE>.