kubernetes: Panic in kubelet Run does not exit, creates multiple parallel Run goroutines
If the kubelet Run method panics (e.g. https://github.com/kubernetes/kubernetes/pull/88915) Run gets called multiple times:
That would start multiple goroutines calling relist() concurrently
From a PR that doesn’t touch the PLEG (but might stress it?), panic in kubelet from pleg:
Mar 03 19:21:03 kind-worker2 kubelet[88624]: fatal error: concurrent map read and map write
Mar 03 19:21:03 kind-worker2 kubelet[88624]: goroutine 1420 [running]:
Mar 03 19:21:03 kind-worker2 kubelet[88624]: runtime.throw(0x43641ed, 0x21)
Mar 03 19:21:03 kind-worker2 kubelet[88624]: GOROOT/src/runtime/panic.go:774 +0x72 fp=0xc0011f57e8 sp=0xc0011f57b8 pc=0x432682
Mar 03 19:21:03 kind-worker2 kubelet[88624]: runtime.mapaccess2_faststr(0x3ddf9e0, 0xc000706690, 0xc00084f1a0, 0x24, 0xc0004b93c8, 0x1)
Mar 03 19:21:03 kind-worker2 kubelet[88624]: GOROOT/src/runtime/map_faststr.go:116 +0x48f fp=0xc0011f5858 sp=0xc0011f57e8 pc=0x4160cf
Mar 03 19:21:03 kind-worker2 kubelet[88624]: k8s.io/kubernetes/pkg/kubelet/pleg.podRecords.setCurrent(0xc000706690, 0xc001e44600, 0x1d, 0x20)
Mar 03 19:21:03 kind-worker2 kubelet[88624]: pkg/kubelet/pleg/generic.go:464 +0x149 fp=0xc0011f5910 sp=0xc0011f5858 pc=0x3318999
Mar 03 19:21:03 kind-worker2 kubelet[88624]: k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).relist(0xc000788960)
Mar 03 19:21:03 kind-worker2 kubelet[88624]: pkg/kubelet/pleg/generic.go:214 +0x2c9 fp=0xc0011f5e20 sp=0xc0011f5910 pc=0x3316a89
Mar 03 19:21:03 kind-worker2 kubelet[88624]: k8s.io/kubernetes/pkg/kubelet/pleg.(*GenericPLEG).relist-fm()
Mar 03 19:21:03 kind-worker2 kubelet[88624]: pkg/kubelet/pleg/generic.go:190 +0x2a fp=0xc0011f5e38 sp=0xc0011f5e20 pc=0x3318cca
Mar 03 19:21:03 kind-worker2 kubelet[88624]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00016fbc0
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 26 (26 by maintainers)
Working on a patch for option #1 now.
looks like the particular panic was an actual bug in that PR (ContainerStatuses indexed in a range of InitContainerStatuses): https://github.com/kubernetes/kubernetes/compare/782bf3341b7c9a60d151c237626d654b08976b91..5719d3138c3e3c27b3b511a8754343594fa95b46#diff-ea8c11a933d6d6c22c0c12b6a38c4b46R338
the bug was fixed before merge
Looking at the logs there are 3 fatal errors that occur:
concurrent map iteration and map write
,concurrent map read and map write
(2x), andfatal error: concurrent map writes
. All are caused by concurrentrelist()
calls.