kyverno: OomKilled when starting kyverno on existing clusters
Software version numbers State the version numbers of applications involved in the bug.
- Kubernetes version: v1.19.6 and v1.19.4
- Kyverno version: 1.3.2-rc1
Describe the bug
OomKilled on startup of main kyverno pod. We tried increasing the memory limit to 2GiB but no change. The only workaround for us was to introduce an environment variable GOGC = 25
.
To Reproduce Steps to reproduce the behavior:
kubectl apply -k https://raw.githubusercontent.com/kyverno/kyverno/v1.3.2-rc1/definitions/release/install.yaml
- See kyverno
Deployment
/Pod
being killed due toOomKilled
, it then enters aCrashLoopBackOff
as expected but never starts successfully. This is the main container, not thekyverno-pre
init container.
We’ve tried this on 2 of our internal clusters, dev and uat, both have plenty of nodes with resources spare - dev runs 175 pods and uat runs 215 pods so pretty similar in size.
The only way we can get kyverno to start is to add a env var GOGC = “25” (default is “100”) which causes the GC to be triggered more often inside the pod.
Expected behavior Does not get OomKilled
Screenshots If applicable, add screenshots to help explain your problem.
Additional context
Logs from pod
Thu, Feb 4 2021 1:03:28 pm | I0204 13:03:28.299569 1 version.go:17] "msg"="Kyverno" "Version"="v1.3.2-rc1"
Thu, Feb 4 2021 1:03:28 pm | I0204 13:03:28.299634 1 version.go:18] "msg"="Kyverno" "BuildHash"="(HEAD/7d8c404922854790f5b5d2dce9a7736c896e7bf7"
Thu, Feb 4 2021 1:03:28 pm | I0204 13:03:28.299643 1 version.go:19] "msg"="Kyverno" "BuildTime"="2021-01-25_05:09:39AM"
Thu, Feb 4 2021 1:03:28 pm | I0204 13:03:28.300728 1 config.go:92] CreateClientConfig "msg"="Using in-cluster configuration"
Thu, Feb 4 2021 1:03:28 pm | I0204 13:03:28.310422 1 reflector.go:175] Starting reflector *unstructured.Unstructured (0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:28 pm | E0204 13:03:28.340333 1 memcache.go:206] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.401743 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.402158 1 util.go:82] "msg"="CRD found" "gvr"="kyverno.io/v1, Resource=clusterpolicies"
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.404770 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.405203 1 util.go:82] "msg"="CRD found" "gvr"="wgpolicyk8s.io/v1alpha1, Resource=clusterpolicyreports"
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.407828 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.408146 1 util.go:82] "msg"="CRD found" "gvr"="wgpolicyk8s.io/v1alpha1, Resource=policyreports"
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.410834 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.411149 1 util.go:82] "msg"="CRD found" "gvr"="kyverno.io/v1alpha1, Resource=clusterreportchangerequests"
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.413860 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.415334 1 util.go:82] "msg"="CRD found" "gvr"="kyverno.io/v1alpha1, Resource=reportchangerequests"
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.417237 1 dynamicconfig.go:117] ConfigData "msg"="init configuration from commandline arguments for filterK8sResources"
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.418311 1 dynamicconfig.go:257] ConfigData "msg"="Init resource filters" "filters"=[{"Kind":"Event","Namespace":"*","Name":"*"},{"Kind":"*","Namespace":"kube-system","Name":"*"},{"Kind":"*","Namespace":"kube-public","Name":"*"},{"Kind":"*","Namespace":"kube-node-lease","Name":"*"},{"Kind":"Node","Namespace":"*","Name":"*"},{"Kind":"APIService","Namespace":"*","Name":"*"},{"Kind":"TokenReview","Namespace":"*","Name":"*"},{"Kind":"SubjectAccessReview","Namespace":"*","Name":"*"},{"Kind":"*","Namespace":"kyverno","Name":"*"},{"Kind":"Binding","Namespace":"*","Name":"*"},{"Kind":"ReplicaSet","Namespace":"*","Name":"*"},{"Kind":"ReportChangeRequest","Namespace":"*","Name":"*"},{"Kind":"ClusterReportChangeRequest","Namespace":"*","Name":"*"},{"Kind":"PolicyReport","Namespace":"*","Name":"*"},{"Kind":"ClusterPolicyReport","Namespace":"*","Name":"*"}]
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.418328 1 dynamicconfig.go:268] ConfigData "msg"="Init resource " "excludeRoles"=""
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.428442 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.432418 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | I0204 13:03:30.434018 1 certificates.go:26] dclient "msg"="Building key/certificate pair for TLS"
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.825099 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:30 pm | E0204 13:03:30.856117 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:32 pm | I0204 13:03:32.930308 1 certificates.go:171] dclient/CAcert "msg"="secret updated" "name"="kyverno-svc.kyverno.svc.kyverno-tls-ca" "namespace"="kyverno"
Thu, Feb 4 2021 1:03:33 pm | E0204 13:03:33.115251 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:33 pm | E0204 13:03:33.124867 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.169461 1 certificates.go:219] dclient/WriteTLSPair "msg"="secret updated" "name"="kyverno-svc.kyverno.svc.kyverno-tls-pair" "namespace"="kyverno"
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.169488 1 registration.go:272] Register "msg"="deleting all webhook configurations"
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.172939 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.176956 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.179308 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.182135 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.185069 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.199255 1 registration.go:337] Register "msg"="webhook configuration deleted" "kind"="ValidatingWebhookConfiguration" "name"="kyverno-policy-validating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.199862 1 resource.go:85] Register "msg"="webhook configuration deleted" "kind"="MutatingWebhookConfiguration" "name"="kyverno-resource-mutating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.199993 1 registration.go:306] Register "msg"="webhook configuration deleted" "kind"="MutatingWebhookConfiguration" "name"="kyverno-policy-mutating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.203319 1 registration.go:416] Register "msg"="webhook configuration deleted" "kind"="MutatingWebhookConfiguration" "name"="kyverno-verify-mutating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.205087 1 resource.go:160] Register "msg"="webhook configuration deleted" "kind"="ValidatingWebhookConfiguration" "name"="kyverno-resource-validating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.207453 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.216088 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.231271 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.243634 1 registration.go:266] Register "msg"="created webhook" "kind"="MutatingWebhookConfiguration" "name"="kyverno-verify-mutating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.246236 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.255549 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.267649 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.392722 1 registration.go:211] Register "msg"="created webhook" "kind"="ValidatingWebhookConfiguration" "name"="kyverno-policy-validating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.396993 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.589066 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.793523 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:35 pm | I0204 13:03:35.989890 1 registration.go:239] Register "msg"="created webhook" "kind"="MutatingWebhookConfiguration" "name"="kyverno-policy-mutating-webhook-cfg"
Thu, Feb 4 2021 1:03:35 pm | E0204 13:03:35.994106 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:36 pm | E0204 13:03:36.188511 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:36 pm | E0204 13:03:36.390931 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:36 pm | I0204 13:03:36.590675 1 registration.go:182] Register "msg"="created webhook" "kind"="ValidatingWebhookConfiguration" "name"="kyverno-resource-validating-webhook-cfg"
Thu, Feb 4 2021 1:03:36 pm | E0204 13:03:36.593262 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:36 pm | E0204 13:03:36.788838 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:36 pm | E0204 13:03:36.997361 1 memcache.go:111] couldn't get resource list for custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.188816 1 registration.go:152] Register "msg"="created webhook" "kind"="MutatingWebhookConfiguration" "name"="kyverno-resource-mutating-webhook-cfg"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574592 1 reflector.go:175] Starting reflector *v1alpha1.ReportChangeRequest (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574628 1 reflector.go:175] Starting reflector *v1.ClusterRoleBinding (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574636 1 reflector.go:175] Starting reflector *v1alpha1.ClusterReportChangeRequest (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574644 1 reportrequest.go:183] ReportChangeRequestGenerator "msg"="start"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574669 1 reportcontroller.go:175] PolicyReportGenerator "msg"="start"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574596 1 validate_controller.go:344] PolicyController "msg"="starting"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574781 1 reflector.go:175] Starting reflector *v1.RoleBinding (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574809 1 controller.go:221] GenerateCleanUpController "msg"="starting"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574826 1 controller.go:108] EventGenerator "msg"="start"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574838 1 generate_controller.go:259] GenerateController "msg"="starting"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574866 1 informer.go:109] PolicyCacheController "msg"="starting"
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574887 1 reflector.go:175] Starting reflector *v1alpha1.ClusterPolicyReport (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574896 1 reflector.go:175] Starting reflector *v1.Namespace (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.575108 1 reflector.go:175] Starting reflector *v1alpha1.PolicyReport (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.575202 1 reflector.go:175] Starting reflector *v1.ConfigMap (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.575239 1 reflector.go:175] Starting reflector *v1.GenerateRequest (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.575357 1 reflector.go:175] Starting reflector *v1.ClusterPolicy (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.575454 1 reflector.go:175] Starting reflector *v1.Policy (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.574632 1 reflector.go:175] Starting reflector *unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.576083 1 reflector.go:175] Starting reflector *v1.Role (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:37 pm | I0204 13:03:37.576305 1 reflector.go:175] Starting reflector *v1.ClusterRole (15m0s) from pkg/mod/k8s.io/client-go@v0.18.12/tools/cache/reflector.go:125
Thu, Feb 4 2021 1:03:38 pm | I0204 13:03:38.397556 1 server.go:495] WebhookServer "msg"="starting service"
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 45 (31 by maintainers)
Happy to report I’ve revisited this OOM issue and am now running 1.3.6 on my cluster. The pod fired up quickly and appears to be stable. I’ll let it burn in for a bit but wanted to share the news.
@snir911 - thanks for checking. @vyankd is working on the fix to add matchedList.
Currently the cache is not configurable. But it looks like we found the reason why Kyverno gets OOM killed.
These caches were introduced to reduce throttling requests in order to process policies faster. I’ll need to investigate a bit more and see how we can optimize this process.
cc @JimBugwadia
Actually, to my surprise, the pod has been running for about 2 hours after 17 restarts. I’ll report back when I have a better understanding of whether or not GOGC has a hand in this success.
@snir911 - you can add GOGC to the env list of the Kyverno container:
If the pod comes up successfully, can you share heap and goroutine dumps?
I found a similar issue in fluxcd, where the user has Dex installed along with a massive amount of other resources. Do you have Dex installed by any chance?