kubernetes: AvailableConditionController doesn't implement proper backoff/retry strategy

What happened: When AvailableConditionController is unable to contact a api service it keeps retrying with high number of attempts per seconds (like 80 attempts per seconds) creating significant cpu usage. https://gist.github.com/mborsz/c6094ae23f84db10593f1e0a0fd5d38d contains kube-apiserver’s logs from one of the attempts.

What you expected to happen: AvailableConditionController should implement a proper backoff/retry strategy in that case.

How to reproduce it (as minimally and precisely as possible):

  1. Create a cluster
  2. Block ssh tunnel access to the nodes with service running
  3. Restart kube-apiserver
  4. Watch logs and cpu usage of kube-apiserver

Anything else we need to know?:

AvailableConditionController seems to be trying to implement some backoff logic, but it either doesn’t work or still generates too big load.

Environment:

  • Kubernetes version (use kubectl version): 1.9.7
  • Cloud provider or hardware configuration: gke
  • OS (e.g. from /etc/os-release): cos
  • Kernel (e.g. uname -a): 4.4.111+
  • Install tools:
  • Others:

/kind bug

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 18 (16 by maintainers)

Most upvoted comments

/assign