kubernetes: pod stuck in terminating after node crash

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

Node crashed and pods are stuck in terminating.

Events: Type Reason Age From Message


Normal Killing 25m (x65928 over 11d) kubelet, 130.61.58.85 Killing container with id docker://k8szk:Container failed liveness probe… Container will be killed and recreated. Normal Killing 5m (x64 over 23m) kubelet, 130.61.58.85 Killing container with id docker://k8szk:Need to kill Pod Warning FailedKillPod 29s (x81 over 23m) kubelet, 130.61.58.85 error killing pod: [failed to “KillContainer” for “k8szk” with KillContainerError: “rpc error: code = Unknown desc = Error response from daemon: Cannot stop container 39f971a7deff4a72c289a3f5aa5af821e174fa706cac6863c09ad8b560e7567c: Cannot kill container 39f971a7deff4a72c289a3f5aa5af821e174fa706cac6863c09ad8b560e7567c: rpc error: code = 14 desc = grpc: the connection is unavailable” , failed to “KillPodSandbox” for “f36b96a8-6e3c-11e8-acb0-0a580aed1de6” with KillPodSandboxError: “rpc error: code = Unknown desc = Error response from daemon: Cannot stop container b665249d9d854560d798aa0a87a1425ffd2616251610884fc26588682a895e30: Cannot kill container b665249d9d854560d798aa0a87a1425ffd2616251610884fc26588682a895e30: rpc error: code = 14 desc = grpc: the connection is unavailable” ]

What you expected to happen:

Nodes terminate and restart.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):

$ kubectl.exe version Client Version: version.Info{Major:“1”, Minor:“9”, GitVersion:“v1.9.0”, GitCommit:“925c127ec6b946659ad0fd596fa959be43f0cc05”, GitTreeState:“clean”, BuildDate:“2017-12-15T21:07:38Z”, GoVersion:“go1.9.2”, Compiler:“gc”, Platform:“windows/amd64”} Server Version: version.Info{Major:“1”, Minor:“9+”, GitVersion:“v1.9.7-2+ff9181f92914d6”, GitCommit:“ff9181f92914d638c93511c5163ac1e5dcbdf492”, GitTreeState:“clean”, BuildDate:“2018-04-24T22:19:31Z”, GoVersion:“go1.9.3”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration:

OCI

  • OS (e.g. from /etc/os-release):

Oracle Linux 7.4

  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 19 (7 by maintainers)

Most upvoted comments

I have same issue with the Tiller-deploy pod. Restart docker in the worker node will fix it temporarily but it came back after 2 days.

/reopen I am also getting this issue with:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T16:51:36Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Running prometheus-server image: prom/prometheus:v2.13.1

Warning  FailedKillPod  90s (x1247 over 4d8h)  kubelet, ip-10-0-2-165.us-gov-west-1.compute.internal  error killing pod: failed to "KillContainer" for "prometheus-server" with KillContainerError: "rpc error: code = Unknown desc = operation timeout: context deadline exceeded"