kubernetes: AvailableConditionController doesn't implement proper backoff/retry strategy

What happened: When AvailableConditionController is unable to contact a api service it keeps retrying with high number of attempts per seconds (like 80 attempts per seconds) creating significant cpu usage. https://gist.github.com/mborsz/c6094ae23f84db10593f1e0a0fd5d38d contains kube-apiserver’s logs from one of the attempts.

What you expected to happen: AvailableConditionController should implement a proper backoff/retry strategy in that case.

How to reproduce it (as minimally and precisely as possible):

Create a cluster
Block ssh tunnel access to the nodes with service running
Restart kube-apiserver
Watch logs and cpu usage of kube-apiserver

Anything else we need to know?:

AvailableConditionController seems to be trying to implement some backoff logic, but it either doesn’t work or still generates too big load.

Environment:

Kubernetes version (use kubectl version): 1.9.7
Cloud provider or hardware configuration: gke
OS (e.g. from /etc/os-release): cos
Kernel (e.g. uname -a): 4.4.111+
Install tools:
Others:

/kind bug

About this issue

Original URL
State: open
Created 6 years ago
Comments: 18 (16 by maintainers)

Most upvoted comments

/assign

cheftako on Oct 26, 2018