kubernetes: Missing taint: node.kubernetes.io/unreachable:NoExecute when nodes enter NotReady state

What happened:

When the worker nodes enter in a NotReady state, the taint node.kubernetes.io/unreachable:NoExecute is not added to the node. Only the node.kubernetes.io/unreachable: NoSchedule taint is added. The effect is that pods running on such workers remain in Running state and never rescheduled.

This is similar to #97100 but in different conditions. Actually we discovered it during test of patch #98168. #98168 works only if you shutdown worker nodes, it doesn’t work if you shutdown masters too.

NAME                STATUS     ROLES                  AGE   VERSION
k8s-caas-infra01    NotReady   infra                  72d   v1.20.6
k8s-caas-infra02    NotReady   infra                  72d   v1.20.6
k8s-caas-infra03    NotReady   infra                  72d   v1.20.6
k8s-caas-master01   Ready      control-plane,master   72d   v1.20.6
k8s-caas-master02   Ready      control-plane,master   72d   v1.20.6
k8s-caas-master03   Ready      control-plane,master   72d   v1.20.6
k8s-caas-worker01   NotReady   worker                 72d   v1.20.6
k8s-caas-worker02   NotReady   worker                 72d   v1.20.6
$ kubectl -n auth-system get pods -o wide | grep infra
keycloak-0                                    1/1     Running   0          98m    10.38.70.92     k8s-caas-infra01   <none>           <none>
keycloak-keycloak-operator-654ff77bf5-hv6c9   1/1     Running   0          107m   10.38.136.220   k8s-caas-infra03   <none>           <none>
keycloak-postgresql-7654cccbb7-8s6fr          1/1     Running   0          107m   10.38.136.232   k8s-caas-infra03   <none>           <none>
$ kubectl describe node k8s-caas-infra03
Name:               k8s-caas-infra03
Roles:              infra
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=k8s-caas-infra03
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/infra=
Annotations:        csi.volume.kubernetes.io/nodeid: {"rbd.csi.ceph.com":"k8s-caas-infra03"}
                    kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.9.5.68/24
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 17 Feb 2021 12:26:12 +0100
Taints:             node.kubernetes.io/unreachable:NoSchedule

What you expected to happen:

We expect the taints:

node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
node.kubernetes.io/not-ready:NoExecute

as per documentation

How to reproduce it (as minimally and precisely as possible):

This issue has been observed during an operational test on an on-prem Kubernetes v1.20.6 cluster. We have an high availability control plane with 3 masters and few workers distributed on two stretched data-center. In order to exercise the DR capability, we switched off all the VMs, both masters and workers.

  1. shut down all the masters and workers
  2. restart only the masters
  3. after masters become Ready, restart a couple of workers.
  4. at this point we expect the system (after the eviction time of 300s) to “move” pods from the workers that are in NotReady to the workers that are in Ready state
  5. check taints on the remaining workers that are still NotReady

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.20.6
  • Cloud provider or hardware configuration: on-premises running on RHEV
  • OS (e.g: cat /etc/os-release): Ubuntu 20.04.2 LTS (Focal Fossa)
  • Kernel (e.g. uname -a): 5.4.0-72-generic
  • Install tools: kubeadm
  • Network plugin and version (if this is a network-related bug): calico
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 25 (10 by maintainers)

Most upvoted comments

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

I think you have to elaborate on this scenario more for us to reproduce it.

I simply switched off (brutally switched off the VM) all the nodes in step 1, not a graceful shutdown. This because the initial intent of the test was a DR simulation. Hope this helps.