kubernetes: pod stuck in terminating after node crash
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened:
Node crashed and pods are stuck in terminating.
Events: Type Reason Age From Message
Normal Killing 25m (x65928 over 11d) kubelet, 130.61.58.85 Killing container with id docker://k8szk:Container failed liveness probe… Container will be killed and recreated. Normal Killing 5m (x64 over 23m) kubelet, 130.61.58.85 Killing container with id docker://k8szk:Need to kill Pod Warning FailedKillPod 29s (x81 over 23m) kubelet, 130.61.58.85 error killing pod: [failed to “KillContainer” for “k8szk” with KillContainerError: “rpc error: code = Unknown desc = Error response from daemon: Cannot stop container 39f971a7deff4a72c289a3f5aa5af821e174fa706cac6863c09ad8b560e7567c: Cannot kill container 39f971a7deff4a72c289a3f5aa5af821e174fa706cac6863c09ad8b560e7567c: rpc error: code = 14 desc = grpc: the connection is unavailable” , failed to “KillPodSandbox” for “f36b96a8-6e3c-11e8-acb0-0a580aed1de6” with KillPodSandboxError: “rpc error: code = Unknown desc = Error response from daemon: Cannot stop container b665249d9d854560d798aa0a87a1425ffd2616251610884fc26588682a895e30: Cannot kill container b665249d9d854560d798aa0a87a1425ffd2616251610884fc26588682a895e30: rpc error: code = 14 desc = grpc: the connection is unavailable” ]
What you expected to happen:
Nodes terminate and restart.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):
$ kubectl.exe version Client Version: version.Info{Major:“1”, Minor:“9”, GitVersion:“v1.9.0”, GitCommit:“925c127ec6b946659ad0fd596fa959be43f0cc05”, GitTreeState:“clean”, BuildDate:“2017-12-15T21:07:38Z”, GoVersion:“go1.9.2”, Compiler:“gc”, Platform:“windows/amd64”} Server Version: version.Info{Major:“1”, Minor:“9+”, GitVersion:“v1.9.7-2+ff9181f92914d6”, GitCommit:“ff9181f92914d638c93511c5163ac1e5dcbdf492”, GitTreeState:“clean”, BuildDate:“2018-04-24T22:19:31Z”, GoVersion:“go1.9.3”, Compiler:“gc”, Platform:“linux/amd64”}
- Cloud provider or hardware configuration:
OCI
- OS (e.g. from /etc/os-release):
Oracle Linux 7.4
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 19 (7 by maintainers)
I have same issue with the Tiller-deploy pod. Restart docker in the worker node will fix it temporarily but it came back after 2 days.
/reopen I am also getting this issue with:
Running prometheus-server image: prom/prometheus:v2.13.1