cert-manager: High memory and CPU consumption in cert-manager-cainjector

Describe the bug: After upgrading cert-manager to version 1.12.2 from version 1.11.0 we noticed pretty high memory and CPU consumption in cert-manager-cainjector pod that keeps rising slowly until it reaches the node limit. After restarting the deployment, memory and CPU drops but it slowly start to rise again. You can see the details in the graph below:

Historic Data of the deployment:

Current Memory and CPU:

cert-manager-xxxxx-xxxxx             1356m        1398Mi          
cert-manager-cainjector-xxxxx-xxxxx  9767m        31189Mi         
cert-manager-webhook-xxxxx-xxxxx        1m           13Mi

Expected behaviour: Memory consumption of cert-manager-cainjector pod should be more than 350MB and CPU no more than 0.002 cores

Steps to reproduce the bug: Upgrade from version 1.11.0 to version 1.12.2.

Anything else we need to know?: We use the default values from helm chart found here https://artifacthub.io/packages/helm/cert-manager/cert-manager

Environment details:: Production

Kubernetes version: 1.26.5-gke.1200
Cloud-provider/provisioner: GKE
cert-manager version: v1.12.2
Install method: helm/helmfile

/kind bug

About this issue

Original URL
State: closed
Created a year ago
Comments: 24 (10 by maintainers)

Commits related to this issue

chore(deps): update helm release cert-manager to v1.12.3 (#24) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cert-manager](https://github.com/cert-manager/ce... — committed to nrdufour/home-ops by nrdufour a year ago

Most upvoted comments

Same here, no sharp ramps with v1.12.3.

zeeZ on Jul 26, 2023

We deployed v1.12.3 and it seems to be much better, thanks for the fix!

Screenshot from 2023-07-26 17-15-35

multani on Jul 26, 2023

Good results here too on AWS EKS v1.24.16 with a few thousand certificates. We re-upgraded from 1.11.4 (after downgrading back to 1.11.4 from 1.12.2).

Thanks for the fix !

RedVortex on Aug 30, 2023

Updating from v1.12.0 to v1.12.3 fixed the CPU/Memory usage spike for me on Digital Ocean Kubernetes v1.27.4-do.0

AllanM007 on Aug 31, 2023

I created a patch release with the fix: https://github.com/cert-manager/cert-manager/releases/tag/v1.12.3

inteon on Jul 26, 2023

@zeeZ do you have memory profiles too?

I’ve let it run for about an hour until it ramped up to consume about one CPU core with the following args, here’s the result:

      --leader-election-namespace=cert-manager
      --enable-profiling=true
      --profiler-address=:8081
      --leader-elect=false
      --enable-certificates-data-source=false
      --enable-customresourcedefinitions-injectable=false
      --enable-apiservices-injectable=false

cainjector.log pprof.cainjector.goroutine.001.pb.gz pprof.cainjector.samples.cpu.001.pb.gz pprof.cainjector.threadcreate.001.pb.gz pprof.cainjector.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz

The secret total in the logs seems a bit wild. There’s only 294 secrets with a total of 577 keys in the cluster

based on @zeeZ’s pprof dumps, this might be related to the logging issue that was reported here: #6104

No json logging, but klog is definitely up there. I’ll see if I can run the the linked fix in the cluster tomorrow.

zeeZ on Jul 25, 2023

I have four different clusters, and only one of them shows this behavior. They’re all running version 1.12.2 of the helm chart with identical configuration.

Luckily the one with the CPU ramp is a dev cluster, so I have access to the profiler, just don’t know how to use it 🙃

This is a 30 minute old cainjector consuming one cpu core: pprof.cainjector.samples.cpu.pb.gz

Edit: after 3 hours, up to 500% cpu usage: pprof.cainjector.samples.cpu_3h.pb.gz

zeeZ on Jul 20, 2023

@lboix For starters we kept it as is without rolling back and we rollout/restart the cert-manager-cainjector deployment periodically. However, depending of the size of the cluster, the leak was rising quickly sometimes. So we rolled back to v1.11.4 now and it seems that there is no leak in that version.

DinStamou on Jul 20, 2023