kubernetes: kube-controller-manager v1.26 high cpu usage

What happened?

Upgraded cluster from v1.24.6 to v1.26.4 (via v1.25.9) and after that kube-controller-manager starts to eat all available cpu: image

Also I see massive workqueue_retries_total{name="cronjob"} metrics rate increase - from 2-3 per second to 20-30k: image

Dumped pprof profile from kube-controller-manager and also see massive cronjob related load: image

What did you expect to happen?

Same CPU usage of kube-controller-manager.

How can we reproduce it (as minimally and precisely as possible)?

IDK. We have 7 similar clusters of same version - issue is presents only in one of them.

Anything else we need to know?

~170 cronjobs ~300 job

Deleted some cronjobs and old failed jobs - didn’t see any effect.

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.4", GitCommit:"f89670c3aa4059d6999cb42e23ccb4f0b9a03979", GitTreeState:"clean", BuildDate:"2023-04-12T12:05:35Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

none, self-hosted baremetal

OS version

# On Linux:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ uname -a
Linux kube-[REDACTED] 5.15.0-53-generic #59~20.04.1-Ubuntu SMP Thu Oct 20 15:10:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Install tools

ansible

Container runtime (CRI) and version (if applicable)

containerd 1.6.8-1

Related plugins (CNI, CSI, …) and versions (if applicable)

flannel v0.14.0

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25 (17 by maintainers)

Commits related to this issue

Most upvoted comments

@sxllwx thx for your effort with reproducer

I’ll make sure to prioritize this next week on Monday to figure out possible fixes.

I noticed that our spec does not set the time zone. I recommend setting it to "TZ=UTC 0 12 * * 3"

don’t set TZ in the string schedule (that will be forbidden in future releases), set it in the cronjob spec