Kubernetes: runContainer: API error (500): Cannot

2019-05-31 03:56发布

问题:

Sometimes pod creation fails with the 500 error on our GKE cluster:

1m        1m        1         installer-u57ab1f7707b03   Pod                 Normal    Scheduled    {default-scheduler }                                       Successfully assigned installer-u57ab1f7707b03 to gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l
1m        1m        1         installer-u57ab1f7707b03   Pod                 Warning   FailedSync   {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l}   Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container ff8573fbf0b90a25b5565b1feb36671f13367115dde74e581cf249be772d8e4e: [8] System error: read parent: connection reset by peer\n"
1m        1m        1         installer-u57ab1f7707b03   Pod                 Warning   FailedSync   {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l}   Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container fbd7151d4489ed3ac9b21ef9ee3268039374fe3aee1f5933dc27d003f5388e7d: [8] System error: read parent: connection reset by peer\n"
1m        1m        1         installer-u57ab1f7707b03   Pod                 Warning   FailedSync   {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l}   Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container c6b7969fd036fd187f8b5b815106887d718780b290b81e6dde12162d15c22728: [8] System error: read parent: connection reset by peer\n"
49s       49s       1         installer-u57ab1f7707b03   Pod                 Warning   FailedSync   {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l}   Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container 5b0d78ee31759a3472f15fe375ef4f2542dcc65518023a1bd06593fe7d28a448: [8] System error: read parent: connection reset by peer\n"
32s       32s       1         installer-u57ab1f7707b03   Pod                 Warning   FailedSync   {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l}   Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container 7ff5941a30ce432aa1b1382e4b20d272a08a7113f79f7f1ff2f8898a00ca8f06: [8] System error: read parent: connection reset by peer\n"
18s       18s       1         installer-u57ab1f7707b03   Pod                 Warning   FailedSync   {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l}   Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container a91ae7d6dc9dee5196e73457d817bc46f8009c26147cc81727920aebfa52cc38: [8] System error: read parent: connection reset by peer\n"
2s        2s        1         installer-u57ab1f7707b03   Pod                 Warning   FailedSync   {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l}   Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508: [8] System error: read parent: connection reset by peer\n"

In docker.log I found:

time="2016-08-10T12:37:24.458097892Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508/shm: invalid argument\nfailed to umount /var/lib/docker/containers/ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508/mqueue: invalid argument"
time="2016-08-10T12:37:24.458280187Z" level=error msg="Handler for POST /containers/ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508/start returned error: Cannot start container ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508: [8] System error: read parent: connection reset by peer"
time="2016-08-10T12:37:24.458315257Z" level=error msg="HTTP Error" err="Cannot start container ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508: [8] System error: read parent: connection reset by peer" statusCode=500
time="2016-08-10T12:37:40.151776337Z" level=warning msg="signal: killed" 

Kubernetes version v1.2.5
Docker version 1.9.1

Any ideas how to fix it?

回答1:

This is probably due to the runc bug in Docker 1.9 where the container reads its config, but closes the read pipe before the the parent is done writing.

A fixed runc is included in Docker 1.10. Kubernetes 1.3 uses Docker 1.11.2, but until you upgrade, you may be able to work around the issue by adding extra characters to your container's command line.