kubernetes: Scheduler leaves pod stuck in pending after worker node reload

What happened:

Kubernetes scheduler may leave a single pod stuck in pending state after a worker node is reloaded/reinstalled. This problem appears to have been introduced in Kubernetes v1.12.4 and v1.13.1. We have not been able to recreate the problem in our testing on earlier releases. The key to the failure is that the worker being reloaded/reinstalled must be the only scheduling option available. The pending pod is eventually scheduled if a new scheduler leader is elected or the node status changes. This appears to be a timing bug since the problem does not always occur.

What you expected to happen:

No pods are stuck in pending state.

How to reproduce it (as minimally and precisely as possible):

Start with a single node cluster running a handful of pods. Cordon, drain and then reload/reinstall the node. Since this is a timing bug, it may take several iterations to recreate the problem.

Environment:

  • Kubernetes version (use kubectl version): 1.12.4, 1.13.1 and 1.13.2
  • Cloud provider or hardware configuration: VMs running in VirtualBox
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.1 LTS
  • Kernel (e.g. uname -a): Linux carrier0-master-1 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: Custom ansible scripts
  • Others: N/A

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 30 (15 by maintainers)

Most upvoted comments

Thanks, @Huang-Wei and @rtheis! K8s 13.3 already has another change (#72558) that should alleviate the problem significantly. K8s 13.4 will have the definitive fix. Given that the root-cause is fixed, we can close this issue.

@rtheis I’ve verified #73309 is able to fix your issue thoroughly. And cherrypick PRs have been raised (1.12 #73567, 1.13 #73568).