kubernetes: VPA panics in standalone kubelets
-
Start a standalone kubelet (not registered to an API server)
-
Manager construction short-circuits
Start()here:
- VPA state is never initialized here:
- All pod syncs then panic on this line:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x3b5f21f]
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).HandlePodAdditions(0xc00073c400, {0xc001816c00?, 0x27, 0x40})
pkg/kubelet/kubelet.go:2412 +0x4df
_Originally posted by @liggitt in https://github.com/kubernetes/kubernetes/issues/102884#issuecomment-1454089210_
cc @thockin @vinaykul /sig node /kind bug /priority important-soon
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 28 (27 by maintainers)
I was able to repro the panic with standalone mode in node e2e:
I built on top @SergeyKanzhelev standalone PR - https://github.com/kubernetes/kubernetes/pull/116551
Here’s my branch: https://github.com/bobbypage/kubernetes/tree/standaloneMode
I ran the node e2e under standalone mode and enabled all alpha features.
From sig-node discussion just now, it sounds like updating mirror to effect changes to the pod is discouraged. We probably should be more explicit and block in validation if there are no sound reasons to allow updates to static pods via api. With that discussion, and given current state with https://github.com/kubernetes/kubernetes/issues/116597 , my change to exclude static pods for computePodResizeAction and doPodResizeAction is not only defensive but may also be the right fix.
I’ll verify standalone kubelet manually for FG enabled/disabled cases later after my meeting with Clayton.
/cc @derekwaynecarr @dchen1107 @SergeyKanzhelev
the panic I saw wasn’t on Windows, it was on Linux, but while spelunking, I saw another possible panic. On Windows, NewPodContainerManager() returns:
https://github.com/kubernetes/kubernetes/blob/ead7d66ee12656cfb7c633dd42a87f8d9cfaa469/pkg/kubelet/cm/container_manager_windows.go#L183-L185
which implements
GetPodCgroupConfigto return nil,nil:https://github.com/kubernetes/kubernetes/blob/ead7d66ee12656cfb7c633dd42a87f8d9cfaa469/pkg/kubelet/cm/container_manager_stub.go#L98-L100
which means currentPodMemoryConfig will be nil here:
https://github.com/kubernetes/kubernetes/blob/ead7d66ee12656cfb7c633dd42a87f8d9cfaa469/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L723-L729