kyverno: [Bug] Webhook Controller Endless Loop
Kyverno Version
1.9.0
Kubernetes Version
1.25.x
Kubernetes Platform
AKS
Kyverno Rule Type
Other
Description
Currently we have the problem, that the webhook controller recreates all mutating and validation webhooks in an endless loop. We have currently installed kyverno in version 1.9.1 (also tested 1.9.0 with the same results)
It looks like the same problem as the following issues:
- https://github.com/kyverno/kyverno/issues/3610
- https://github.com/kyverno/kyverno/issues/3286
- https://github.com/kyverno/kyverno/issues/2104
which are all closed. After further investigating and changing the log level to --v=4
we have seen, that the webhook-controller is the misbehaving part. Interestingly there a no error inside the logs about “errors” on updating or creating the webhook, so normally after the first update, the controller should stop updating the webhook if I understand the functionality correctly.
As an example i have filtered the logs for webhook-controller events:
kubectl logs -n kyverno aks-infra-kyverno-7f8bd6764b-6rgkt -f | grep webhook-controller fact-aks-infra
Defaulted container "kyverno" out of: kyverno, kyverno-pre (init)
I0310 15:34:04.892045 1 controller.go:28] setup/leader/controllers "msg"="starting controller" "name"="webhook-controller" "workers"=2
I0310 15:34:04.892061 1 run.go:19] webhook-controller "msg"="starting ..."
I0310 15:34:04.892089 1 run.go:39] webhook-controller/routine "msg"="starting routine" "id"=0
I0310 15:34:04.892477 1 run.go:30] webhook-controller/worker "msg"="starting worker" "id"=0
I0310 15:34:04.892512 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="kyverno-policy-mutating-webhook-cfg" "name"="kyverno-policy-mutating-webhook-cfg" "namespace"=""
I0310 15:34:04.892696 1 run.go:30] webhook-controller/worker "msg"="starting worker" "id"=1
I0310 15:34:04.892727 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=1 "key"="kyverno-resource-mutating-webhook-cfg" "name"="kyverno-resource-mutating-webhook-cfg" "namespace"=""
I0310 15:34:04.912706 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="23.402µs" "id"=1 "key"="kyverno-resource-mutating-webhook-cfg" "name"="kyverno-resource-mutating-webhook-cfg" "namespace"=""
I0310 15:34:04.912737 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=1 "key"="kyverno-verify-mutating-webhook-cfg" "name"="kyverno-verify-mutating-webhook-cfg" "namespace"=""
I0310 15:34:04.914684 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="35.003µs" "id"=0 "key"="kyverno-policy-mutating-webhook-cfg" "name"="kyverno-policy-mutating-webhook-cfg" "namespace"=""
I0310 15:34:04.914711 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="aks-node-mutating-webhook" "name"="aks-node-mutating-webhook" "namespace"=""
I0310 15:34:04.914725 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="13.801µs" "id"=0 "key"="aks-node-mutating-webhook" "name"="aks-node-mutating-webhook" "namespace"=""
I0310 15:34:04.914742 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="aks-webhook-admission-controller" "name"="aks-webhook-admission-controller" "namespace"=""
I0310 15:34:04.914761 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="15.902µs" "id"=0 "key"="aks-webhook-admission-controller" "name"="aks-webhook-admission-controller" "namespace"=""
I0310 15:34:04.914781 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="cert-manager-webhook" "name"="cert-manager-webhook" "namespace"=""
I0310 15:34:04.914806 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="21.701µs" "id"=0 "key"="cert-manager-webhook" "name"="cert-manager-webhook" "namespace"=""
I0310 15:34:04.914826 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="ingress-nginx-admission" "name"="ingress-nginx-admission" "namespace"=""
I0310 15:34:04.914850 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="22.402µs" "id"=0 "key"="ingress-nginx-admission" "name"="ingress-nginx-admission" "namespace"=""
I0310 15:34:04.914871 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="ingress-nginx-public-admission" "name"="ingress-nginx-public-admission" "namespace"=""
I0310 15:34:04.914889 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="22.802µs" "id"=0 "key"="ingress-nginx-public-admission" "name"="ingress-nginx-public-admission" "namespace"=""
I0310 15:34:04.914904 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="kyverno-policy-validating-webhook-cfg" "name"="kyverno-policy-validating-webhook-cfg" "namespace"=""
I0310 15:34:04.925285 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="16.101µs" "id"=1 "key"="kyverno-verify-mutating-webhook-cfg" "name"="kyverno-verify-mutating-webhook-cfg" "namespace"=""
I0310 15:34:04.925315 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=1 "key"="kyverno-resource-validating-webhook-cfg" "name"="kyverno-resource-validating-webhook-cfg" "namespace"=""
I0310 15:34:04.931332 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="17.502µs" "id"=0 "key"="kyverno-policy-validating-webhook-cfg" "name"="kyverno-policy-validating-webhook-cfg" "namespace"=""
I0310 15:34:04.931373 1 run.go:94] webhook-controller/worker "msg"="reconciling ..." "id"=0 "key"="aks-node-validating-webhook" "name"="aks-node-validating-webhook" "namespace"=""
I0310 15:34:04.931396 1 run.go:96] webhook-controller/worker "msg"="done" "duration"="24.202µs" "id"=0
[ . . . ]
As a follow up error, we got throttling messages from the api server:
I0310 15:37:58.508759 1 request.go:614] Waited for 73.498116ms due to client-side throttling, not priority and fairness, request: PUT:https://192.168.0.1:443/apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/kyverno-policy-mutating-webhook-cfg
To get the number of update, that will be performed by the webhook-controller, we watched the current version of the webhook. After around 5 minutes, we got already 2846 Updates:
$ kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io kyverno-cleanup-validating-webhook-cfg
NAME WEBHOOKS AGE
kyverno-cleanup-validating-webhook-cfg 1 5m4s
$ kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io kyverno-cleanup-validating-webhook-cfg -o yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
creationTimestamp: "2023-03-10T14:49:19Z"
generation: 2846
labels:
webhook.kyverno.io/managed-by: kyverno
name: kyverno-cleanup-validating-webhook-cfg
resourceVersion: "245823835"
uid: 06eb8596-4d13-4264-a651-c8fabef64cdd
[ . . . ]
Also we tested to delete all ClusterPolicies
and Policies
, but that changed nothing on the endless loop behavior.
As a last hint, we also see this behavior with other aks clusters and different kubernetes versions (Kubernetes Versions 1.24 and 1.25).
Maybe someone has an idea, what we can check or try, in order to get the correct behavior again.
Thanks in advance Dennis
PS: The full logs can be found on pastebin: https://pastebin.com/MYpxAKe4
Steps to reproduce
- Install Kyverno 1.9.X inside an aks managed kubernetes cluster with 3 replicas (Helm Chart Version 2.7.1)
- Watch the updating webhooks in an endless loop
Expected behavior
After the first correct creation of the webhook configs, the webhook-controller
should stop updating the webhooks.
Screenshots
No response
Kyverno logs
No response
Slack discussion
No response
Troubleshooting
- I have read and followed the documentation AND the troubleshooting guide.
- I have searched other issues in this repository and mine is not recorded.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 43 (25 by maintainers)
Glad I found this issue as we at @swisspost also face this issue on our AKS clusters.
Like @dhemeier posted in some comments above, I also found it inside the FAQ of AKS: https://learn.microsoft.com/en-us/azure/aks/faq#can-admission-controller-webhooks-impact-kube-system-and-internal-aks-namespaces
-> We would also appreciate a backported fix in 1.9.x
1.9.3 is available with this backport: https://github.com/kyverno/kyverno/releases/tag/v1.9.3
FunFact, found an admission webhook in generation 45.678.532 😄