kyverno: [Bug] admission reports piled up causing etcd turned into read-only mode
Kyverno Version
1.10.3
Description
Follow up issue report from slack discussion
1.24
EKS cluster
# HELP apiserver_storage_objects [STABLE] Number of stored objects at the time of last check split by kind.
# TYPE apiserver_storage_objects gauge
apiserver_storage_objects{resource="admissionreports.kyverno.io"} 1.601408e+06
Millions of kyverno admission reports piled up since June, 2023 and they occupied most of the space in etcd db. It breached the upstream recommended maximum db size quota (8G) and then turned the etcd into read-only mode.
Entries by 'Kind' (total 9.5 GB):
+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+--------+
| KEY GROUP | KIND | SIZE |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+--------+
| /registry/kyverno.io/admissionreports/monitoring,/registry/kyverno.io/admissionreports/monitoring,/registry/kyverno.io/admissionreports/monitoring,/re | AdmissionReport | 9.4 GB |
kyverno-app-controller-pod-spec.yaml was the pod spec when the db was filled up while I am not sure if the user has ever upgraded the controller version in the past since June, 2023. The 1.10.3
Kyverno Version is fetched from ghcr.io/kyverno/kyverno:v1.10.3
in this spec.
kyverno-admission-report-sample.json was one of the example admission report custom resources.
Please let me know if kyverno community wants more information like apiserver audit log
or other admission report samples.
Slack discussion
https://kubernetes.slack.com/archives/CLGR9BJU9/p1700252421515759
Troubleshooting
- I have read and followed the documentation AND the troubleshooting guide.
- I have searched other issues in this repository and mine is not recorded.
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 17 (10 by maintainers)
Great, now we know why admission reports were piled up.
@KhaledEmaraDev - can we perform the load testing against Kyverno 1.10.x and capture the cronjob resource usage based on various loads?
Thanks for the pointer. In Kyverno 1.10.x, there are “aggregate” and “non-aggregate” admission reports. The stale non-aggregate admission reports are cleaned up by using the label as you can see here. With 1.11.x, the admission reports have been changed to the short-lived resource and are garbage collected right after their aggregation.
We are continuously working on optimizing the reporting system. As Jim mentioned above, we are working towards leveraging API aggregation to support alternate storage backends for reports in the Kyverno 1.12 release, see https://github.com/kyverno/KDP/pull/51.
@chaochn47 - you can search the cronjob “kyverno-cleanup-admission-reports” in the namespace that Kyverno was deployed.