kyverno: [BUG] Validation webhook fails and stops resource interactions every time kyverno Helmrelease is down
Software version numbers
- Kubernetes version: 1.21.2
- Kubernetes platform (if applicable; ex., EKS, GKE, OpenShift): AKS
- Kyverno version: v1.5.1
Describe the bug If, for whatever reason, the Kyverno release is down (in my case it was an OOM error in only one of the environments) without deleting the deployment or helmrelease of Kyverno, the validation webhook kyverno-resource-validating-webhook-cfg will fail to allow interaction with Kubernetes resources.
This should not be the behaviour, especially if the kyverno-policies validationFailureAction is in audit mode.
To Reproduce Steps to reproduce the behavior:
- Run Kyverno and Kyverno-policies helmrelease.
helm install kyverno kyverno/kyverno --namespace kyverno --create-namespace
helm install kyverno-policies kyverno/kyverno-policies --namespace kyverno
- Make Kyverno pod(s) unavailable without deleting validatingwebhookconfigurations to simulate an error. (i.e edit pod and change image url)
k edit <kyverno-pod> -n kyverno
Expected behavior The webhook shouldn’t disallow creating pods, deleting pods etc… Especially if it’s a policy that’s only in audit. Auditing policies should not be able to affect other namespaces when Kyverno is having an error.
Screenshots
Additional context
After steps 1 and 2: Cannot even delete a pod:
k delete <pod-1>
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 22 (10 by maintainers)
With default policies from chart kyverno-policies, using multiple replicas on Kyverno doesn’t prevent API server requests to be rejected when deleting the deployment, for example. All kyverno pods will be set to TERMINATING, going the the NOT-READY state. The endpoints from service kyverno-svc will all be removed, and the API server will not have any endpoint for validating the requests. This will block further updates of any pod, etc. including kyverno themselves, they won’t be able to really terminate. Not even sure that the process inside the containers will be sent a signal for shutting down. As a result of the processes not shutting down, the ValidatingWebhookConfiguration object will not be auto-deleted, and everything will be locked until someone will delete the ValidatingWebhookConfiguration.
If you’ve specified
--autoUpdateWebhooks=false
, you can additionally configure namespace selector to exclude kyverno’s own namespace. That should prevent situation described by @demikl from happening.Example values for helm:
This is currently not possible to do without setting
--autoUpdateWebhooks=false
, but there’s a ticket for that: #2320@admincasper - for the original issue described on top, you need to set
spec.failurePolicy
toIgnore
to let admission requests pass if Kyverno is not responding.In my case, I have installed the same sample policies with
Ignore
failure policy, and edited container to use invalid image, I was able to create/delete pods. Tested againstv1.5.4
.