kubernetes: AvailableConditionController doesn't implement proper backoff/retry strategy
What happened: When AvailableConditionController is unable to contact a api service it keeps retrying with high number of attempts per seconds (like 80 attempts per seconds) creating significant cpu usage. https://gist.github.com/mborsz/c6094ae23f84db10593f1e0a0fd5d38d contains kube-apiserver’s logs from one of the attempts.
What you expected to happen: AvailableConditionController should implement a proper backoff/retry strategy in that case.
How to reproduce it (as minimally and precisely as possible):
- Create a cluster
- Block ssh tunnel access to the nodes with service running
- Restart kube-apiserver
- Watch logs and cpu usage of kube-apiserver
Anything else we need to know?:
AvailableConditionController seems to be trying to implement some backoff logic, but it either doesn’t work or still generates too big load.
Environment:
- Kubernetes version (use
kubectl version): 1.9.7 - Cloud provider or hardware configuration: gke
- OS (e.g. from /etc/os-release): cos
- Kernel (e.g.
uname -a): 4.4.111+ - Install tools:
- Others:
/kind bug
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 18 (16 by maintainers)
/assign