kubernetes: Pod stuck on Terminating after node lost

What happened: One pod is running on a worker node. After the node poweroff, the pod was stuck on ‘Terminating’ status. From pod metadata, its status is still ‘Running’.

What you expected to happen: The pod status should change to ‘NodeLost’.

How to reproduce it (as minimally and precisely as possible):

  1. apply pod1 in node1
  2. poweroff node1
  3. after several minutes, pod1 gets stuck on ‘Terminating’

Anything else we need to know?: It works as expected in k8s 1.11. But in k8s 1.13 we had this issue.

Environment:

  • Kubernetes version (use kubectl version): kubectl version Client Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.1”, GitCommit:“eec55b9ba98609a46fee712359c7b5b365bdd920”, GitTreeState:“clean”, BuildDate:“2018-12-13T10:39:04Z”, GoVersion:“go1.11.2”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.1”, GitCommit:“eec55b9ba98609a46fee712359c7b5b365bdd920”, GitTreeState:“clean”, BuildDate:“2018-12-13T10:31:33Z”, GoVersion:“go1.11.2”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration:

  • OS (e.g. from /etc/os-release): cat /etc/os-release NAME=“Ubuntu” VERSION=“16.04.5 LTS (Xenial Xerus)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 16.04.5 LTS” VERSION_ID=“16.04” HOME_URL=“http://www.ubuntu.com/” SUPPORT_URL=“http://help.ubuntu.com/” BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/” VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial

  • Kernel (e.g. uname -a): uname -a Linux prme-hs2-nsbu2-dhcp-001-223 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Others:

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 14
  • Comments: 47 (11 by maintainers)

Most upvoted comments

I too faced the same issue and found solution, its worked for me,

Am running v1.15.3 cluster, after node lost the pods are stuck at terminating state.

Added below taint to terminate the pod in 2s when the node state changed to NotReady/unreachable.

tolerations:
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 2s
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 2s

change terminationGracePeriodSeconds to 0secs.

Above all are pod specs.

@srinathkotu Expected symptom on “lost node” is that the pod running on that node would be showing as “Terminating” after terminationSeconds, and then a replacement pod will be created, and scheduled onto other available nodes if possible.

do you mean that the pods should get deleted automatically after the tolerationseconds time is reached?

Yes, but it only modifies the “deletionTimestamp” of that pod, and needs another kubelet’s signal to delete the API metadata (stored in etcd) totally. Unfortunately in the “node lost” case, the network is lost, so the API object can’t be deleted, so it still shows as “Terminating”.

However, this “terminating” pod doesn’t count as a valid replica of the workload. Say you have a 2-replica app prior to node lost, and then node lost after “terminationSeconds”, you will see 2 pods running, and 1 pod terminating. Only when the lost node gets back, it will reconcile to remove the API info of the terminating pod.

Based on my understanding even deletionTimestamp is marked, pod may be still running there.

deletionTimestamp means a pod is marked for deletion. But for the “node unreachable” case, API server can’t get heartbeat from kubelet, so it’s just “marked” for deletion, and when tolerationSeconds passed, a new pod will be created to replace the “zombie” pod. And the “zombie” pod info will only be cleaned up when kubelet is connected to APIserver after connectivity issue is resolved.

Delete the node from the cluster and it will remove the pods that are in terminating state on that dead node

In fact, this issue is actually even worse, because there is no way to detect this zombie pods unless you manually run the command.