kubernetes helm: “lost connection to pod” and “tra

2020-07-06 05:44发布

问题:

I run helm upgrade --install to modify the state of my kubernetes cluster and I sometimes get an error like this:

22:24:34 StdErr: E0126 17:24:28.472048   48084 portforward.go:178] lost connection to pod
22:24:34 Error: UPGRADE FAILED: transport is closing

It seems that I am not the only one, and it seems to happen with many different helm commands. All of these github issues have descriptions or comments mentioning "lost connection to pod" or "transport is closing" errors (usually both):

  • https://github.com/kubernetes/helm/issues/1183
  • https://github.com/kubernetes/helm/issues/2003
  • https://github.com/kubernetes/helm/issues/2025
  • https://github.com/kubernetes/helm/issues/2288
  • https://github.com/kubernetes/helm/issues/2560
  • https://github.com/kubernetes/helm/issues/3015
  • https://github.com/kubernetes/helm/issues/3409

While it can be educational to read through hundreds of github issue comments, usually it's faster to cut to the chase on stackoverflow, and it didn't seem like this question existed yet, so here it is. Hopefully some quick symptom fixes and eventually one or more root cause diagnoses end up in the answers.

回答1:

I was able to correct this by adding the tiller host information to the helm install command.

--host=10.111.221.14:443

You can get your tiller IP this way

$ kubectl get svc -n kube-system tiller-deploy
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
tiller-deploy   ClusterIP   10.111.221.14   <none>        44134/TCP   34h

Full command example

helm install stable/grafana --name=grafana --host=10.111.221.14:4413

I know this is a bit of a work around but all other functions of helm are performing properly after installing via this method. I did not have to add the host information again after the initial install for performing upgrades or rollbacks. Hope this helps!



回答2:

Memory limits were causing this error for me. The following fixed it:

kubectl set resources deployment tiller-deploy --limits=memory=200Mi


回答3:

Deleting the tiller deployment and recreating it is only fix I've seen on github (here and here). This has been most helpful to people when the same helm command fails repeatedly (not with intermittent failures, though you could try it).

delete tiller (helm's server-side component):

kubectl delete deployment -n kube-system tiller-deploy
# deployment "tiller-deploy" deleted

and recreate it:

helm init --upgrade
# $HELM_HOME has been configured at /root/.helm.
# Tiller (the helm server side component) has been upgraded to the current version.
# Happy Helming!

Bouncing tiller obviously won't fix the root cause. There is hopefully a better answer than this forthcoming, maybe from https://github.com/kubernetes/helm/issues/2025. This is the only open github issue as of 13 Feb 2018.