kubernetes: k8s reports pod as "Terminated: Error" with "Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container"
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
“error syncing pod” “no such container”
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release): Ubuntu 16.04.1 LTS
- Kernel (e.g.
uname -a): 4.4.0-36-generic x86_64 - Install tools: kops 1.6.0-beta.1
- Others:
What happened:
A pod was restarted a few times (it was killed by the kernel due to running out of memory). After the last restart, the pod appears stuck in an Error state.
~$ kubectl -n master get pods dhrubacol11222268508-leaf-0-84612
NAME READY STATUS RESTARTS AGE
dhrubacol11222268508-leaf-0-84612 0/1 Error 3 16h
kubectl describe shows a “no such container” error:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 5s 346 kubelet, ip-172-20-39-143.us-west-2.compute.internal Warning FailedSync Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container: bedfcb3556065064b60471b8ebb73b09c1c450cfd4d07a087a45ba5d8e83dd2f
kubectl logs show the logs from the last iteration of the pod (the one that finished 1 hour ago).
Interestingly, the pod is actually running; kubectl exec lets me enter the pod.
What you expected to happen:
I expected the pod to be restarted.
How to reproduce it (as minimally and precisely as possible):
Does not appear reproducible. This has happened twice to us so far; deleting the pod by hand fixed the problem.
Anything else we need to know:
Note the following in the kubelet logs (more details below):
At 22:01:58, the pod dies. The container ID starts with bedfcb…
At 22:29:55, the pod dies again; The container ID starts with a102f9…
At 22:30:25, getPodContainerStatuses stats failing, but the container ID is from a previous iteration of the pod – the one that died at 22:01, not the one that died at 22:29.
May 10 22:01:58 ip-172-20-39-143 kubelet[9126]: I0510 22:01:58.521978 9126 kubelet.go:1842] SyncLoop (PLEG): "dhrubacol11222268508-leaf-0-84612_master(da61dd55-3551-11e7-b03c-0207e1349dc2)", event: &pleg.PodLifecycleEvent{ID:"da61dd55-3551-11e7-b03c-0207e1349dc2", Type:"ContainerDied", Data:"bedfcb3556065064b60471b8ebb73b09c1c450cfd4d07a087a45ba5d8e83dd2f"}
May 10 22:29:55 ip-172-20-39-143 kubelet[9126]: I0510 22:29:55.428792 9126 kubelet.go:1842] SyncLoop (PLEG): "dhrubacol11222268508-leaf-0-84612_master(da61dd55-3551-11e7-b03c-0207e1349dc2)", event: &pleg.PodLifecycleEvent{ID:"da61dd55-3551-11e7-b03c-0207e1349dc2", Type:"ContainerDied", Data:"a102f91130d14cd4c1040fa7957e174190cb6fe531136f939b436ff337af3082"}
May 10 22:30:03 ip-172-20-39-143 kubelet[9126]: I0510 22:30:03.368974 9126 kuberuntime_manager.go:742] checking backoff for container "leafagg" in pod "dhrubacol11222268508-leaf-0-84612_master(da61dd55-3551-11e7-b03c-0207e1349dc2)"
May 10 22:30:25 ip-172-20-39-143 kubelet[9126]: E0510 22:30:25.969445 9126 kuberuntime_manager.go:858] getPodContainerStatuses for pod "dhrubacol11222268508-leaf-0-84612_master(da61dd55-3551-11e7-b03c-0207e1349dc2)" failed: rpc error: code = 2 desc = Error: No such container: bedfcb3556065064b60471b8ebb73b09c1c450cfd4d07a087a45ba5d8e83dd2f
May 10 22:30:25 ip-172-20-39-143 kubelet[9126]: E0510 22:30:25.969466 9126 generic.go:239] PLEG: Ignoring events for pod dhrubacol11222268508-leaf-0-84612/master: rpc error: code = 2 desc = Error: No such container: bedfcb3556065064b60471b8ebb73b09c1c450cfd4d07a087a45ba5d8e83dd2f
May 10 22:30:27 ip-172-20-39-143 kubelet[9126]: E0510 22:30:27.304149 9126 kuberuntime_manager.go:858] getPodContainerStatuses for pod "dhrubacol11222268508-leaf-0-84612_master(da61dd55-3551-11e7-b03c-0207e1349dc2)" failed: rpc error: code = 2 desc = Error: No such container: bedfcb3556065064b60471b8ebb73b09c1c450cfd4d07a087a45ba5d8e83dd2f
May 10 22:30:27 ip-172-20-39-143 kubelet[9126]: E0510 22:30:27.304180 9126 generic.go:239] PLEG: Ignoring events for pod dhrubacol11222268508-leaf-0-84612/master: rpc error: code = 2 desc = Error: No such container: bedfcb3556065064b60471b8ebb73b09c1c450cfd4d07a087a45ba5d8e83dd2f
May 10 22:30:27 ip-172-20-39-143 kubelet[9126]: E0510 22:30:27.408508 9126 kuberuntime_manager.go:858] getPodContainerStatuses for pod "dhrubacol11222268508-leaf-0-84612_master(da61dd55-3551-11e7-b03c-0207e1349dc2)" failed: rpc error: code = 2 desc = Error: No such container: bedfcb3556065064b60471b8ebb73b09c1c450cfd4d07a087a45ba5d8e83dd2f
(and then the “getPodContainerStatuses… failed” messages spam the logs)
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 28
- Comments: 62 (15 by maintainers)
Please post to stack overflow for support questions
This appears to be a Docker issue, actually.
docker ps -alists these containers stuck in theDeadstate, but the containers don’t exist.journalctl -u dockershows lots ofMay 12 20:48:07 ip-10-0-114-119 dockerd[8767]: time=“2017-05-12T20:48:07.051761235Z” level=error msg=“Handler for GET /v1.24/containers/0a285b05d00344564e05cf9995a8a75479ff69a16eada920144f8bdb55446429/json returned error: open /var/lib/docker/overlay/ace5b2af622e39a87e76fe57077ece42616cadb10e0d4b0d2473a68a17cbcfe2/lower-id: no such file or directory”Restarting docker fixed this, at least for now.
@philipn @bcorijn @cooper667 @thegranddesign @BradErz @greenkiwi pavelhritonenko @andreychernih … my thesis when I asked a bunch of you what the issue was (see https://github.com/kubernetes/kubernetes/issues/45626#issuecomment-319611102) is that part of the problem is with the
4.4.65-k8skernel. I’ve been running/testing4.4.78-k8s(see https://github.com/kopeio/kubernetes-kernel/pull/8) for a couple of weeks and my cluster is behaving much much better. Sorry guys, kinda got swamped and didn’t report back.The common thread here that supports this thesis is people having this issue are using kops 1.6 which comes with a
4.4.65-k8skernel i.e. AMIkope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02(if you set up before the new AMI was built).I submitted a PR to bump this AMI to
4.4.78-k8sin https://github.com/kopeio/kubernetes-kernel/pull/8 which is probably what @bcorijn @pavelhritonenko and @cooper667 are using and things seems to be going well.You guys might be interested in https://github.com/kubernetes/kops/issues/2901 and https://github.com/kubernetes/kops/issues/2928.
Having the same problem - a lot of containers in Dead state.
This is a fresh 1.6 installation.
It looks like there was a kernel panic on this node and system rebooted:
Found https://github.com/kubernetes/kops/issues/874 related to this crash, trying to upgrade kernel to 4.4.70 at least. But it still concerning that kubernetes did not recover from this state correctly.
Restarting Docker does not help, pods are still in “Terminted” state and kubernetes is not trying to restart them.
I ended up running the following command which fixed the problem:
So far it’s 12 days on 1.7 without seeing this issue, so quite confident it is fixed for me. Not sure if it was the Kube upgrade or the new AMI however…
@aabed @armandocerna @bcorijn @philipn @tudor I have a few questions:
Not sure if this helps, but I am only getting this issue with this Jupyter image. I haven’t dug in much to figure out why yet.
Seeing this in our cluster as well. Quite an annoying bug, as it will indeed get a deployment stuck in a state with less replicas than expected until I manually intervene. Did you find any more permanent solution/workaround @tudor?