kyverno: [BUG] Kyverno panics when upgrade is running

Software version numbers State the version numbers of applications involved in the bug.

  • Kubernetes version: 1.20.5
  • Kubernetes platform (if applicable; ex., EKS, GKE, OpenShift): Bare-Metal on Linux
  • Kyverno version: 1.4.2 to 1.4.3

Describe the bug I pushed out Kyverno 1.4.3 via Helm chart 2.0.3 to upgrade Kyverno 1.4.2 and Helm chart 2.0.2 and noticed that the 2 HA pods that have been running 1.4.2 are frequently crashing.

To Reproduce I think the issue is I have hundreds of namespaces with policy reports that are getting cleaned up during the upgrade of Kyverno by the kyverno-pre. During this long running process, the soon-to-be-upgraded pods are crashing.

Additional context Here is the panic I managed to capture:

I1005 19:22:09.821277       1 certmanager.go:129] CertManager "msg"="start managing certificate"  
E1005 19:22:09.821625       1 runtime.go:78] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x1c10aa0), concrete:(*runtime._type)(nil), asserted:(*runtime._type)(0x1b88880), missingMethod:""} (interface conversion: interface {} is nil, not string)
goroutine 2830 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1c78320, 0xc004830de0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:48 +0x86
panic(0x1c78320, 0xc004830de0)
        /usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/kyverno/kyverno/pkg/policyreport.updateSummary(0xc004318200, 0x1b, 0x20, 0xc000c29a58)
        /kyverno/pkg/policyreport/policyreport.go:184 +0x79f
github.com/kyverno/kyverno/pkg/policyreport.updateResults(0xc00458a2a0, 0xc00458a2a0, 0x0, 0x0, 0xc0045a2960, 0xc000a2c210, 0xc0039deb10, 0xc00458a2a0)
        /kyverno/pkg/policyreport/policyreport.go:101 +0x217
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).createReportIfNotPresent(0xc00013a2d0, 0xc003083290, 0xd, 0xc000a2c210, 0x1b30160, 0xc004465db8, 0x0, 0x0, 0x1b88880, 0xc000498ed0)
        /kyverno/pkg/policyreport/reportcontroller.go:314 +0xbf
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).syncHandler(0xc00013a2d0, 0xc003083290, 0xd, 0x415100, 0xc000b3aa18, 0xc001935f20, 0x20)
        /kyverno/pkg/policyreport/reportcontroller.go:294 +0x27f
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).processNextWorkItem(0xc00013a2d0, 0x203000)
        /kyverno/pkg/policyreport/reportcontroller.go:250 +0x1ee
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).runWorker(...)
        /kyverno/pkg/policyreport/reportcontroller.go:232
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0023870b0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0023870b0, 0x26e9be0, 0xc00458a270, 0x1, 0xc000194300)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0023870b0, 0x3b9aca00, 0x0, 0xc000055a01, 0xc000194300)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0023870b0, 0x3b9aca00, 0xc000194300)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:90 +0x4d
created by github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).Run
        /kyverno/pkg/policyreport/reportcontroller.go:225 +0x5ca
panic: interface conversion: interface {} is nil, not string [recovered]
        panic: interface conversion: interface {} is nil, not string

goroutine 2830 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:55 +0x109
panic(0x1c78320, 0xc004830de0)
        /usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/kyverno/kyverno/pkg/policyreport.updateSummary(0xc004318200, 0x1b, 0x20, 0xc000c29a58)
        /kyverno/pkg/policyreport/policyreport.go:184 +0x79f
github.com/kyverno/kyverno/pkg/policyreport.updateResults(0xc00458a2a0, 0xc00458a2a0, 0x0, 0x0, 0xc0045a2960, 0xc000a2c210, 0xc0039deb10, 0xc00458a2a0)
        /kyverno/pkg/policyreport/policyreport.go:101 +0x217
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).createReportIfNotPresent(0xc00013a2d0, 0xc003083290, 0xd, 0xc000a2c210, 0x1b30160, 0xc004465db8, 0x0, 0x0, 0x1b88880, 0xc000498ed0)
        /kyverno/pkg/policyreport/reportcontroller.go:314 +0xbf
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).syncHandler(0xc00013a2d0, 0xc003083290, 0xd, 0x415100, 0xc000b3aa18, 0xc001935f20, 0x20)
        /kyverno/pkg/policyreport/reportcontroller.go:294 +0x27f
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).processNextWorkItem(0xc00013a2d0, 0x203000)
        /kyverno/pkg/policyreport/reportcontroller.go:250 +0x1ee
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).runWorker(...)
        /kyverno/pkg/policyreport/reportcontroller.go:232
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0023870b0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0023870b0, 0x26e9be0, 0xc00458a270, 0x1, 0xc000194300)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0023870b0, 0x3b9aca00, 0x0, 0xc000055a01, 0xc000194300)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0023870b0, 0x3b9aca00, 0xc000194300)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:90 +0x4d
created by github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).Run
        /kyverno/pkg/policyreport/reportcontroller.go:225 +0x5ca

Also the logs from kyverno-pre during upgrade are this times about 500 :

I1005 19:26:17.673628       1 main.go:333]  "msg"="successfully cleaned up resource"  "kind"="PolicyReport" "name"="policyreport-ns-user-zyou"
I1005 19:26:17.874156       1 main.go:333]  "msg"="successfully cleaned up resource"  "kind"="PolicyReport" "name"="polr-ns-user-zyou"
I1005 19:26:18.073337       1 main.go:333]  "msg"="successfully cleaned up resource"  "kind"="PolicyReport" "name"="pr-ns-user-zyou"

We run Kubernetes where our users can use web app to launch pods for various applications and we bootstrap a namespace per user and we have hundreds of users logging in. We have tools deployed to cleanup namespaces that haven’t been used recently so that might be why these policy report cleanups are taking so long or maybe the cleanup is normal no matter what.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

Upgraded to 1.5.4 and saw no issues. There were only 52 user namespaces rather than the several hundred when this issue happened but we’ve since been reaping namespaces more frequently so if there was some threshold we crossed, not sure we will cross it again in terms of how many namespaces Kyverno must walk during init container. I’m going to go ahead and close this out and will re-open if same issues comes back.