kubernetes: VPA panics in standalone kubelets

  1. Start a standalone kubelet (not registered to an API server)

  2. Manager construction short-circuits Start() here:

https://github.com/kubernetes/kubernetes/blob/253ab3eda71f250ad6692bb16f035cebaf0651c9/pkg/kubelet/status/status_manager.go#L186-L192

  1. VPA state is never initialized here:

https://github.com/kubernetes/kubernetes/blob/253ab3eda71f250ad6692bb16f035cebaf0651c9/pkg/kubelet/status/status_manager.go#L195-L200

  1. All pod syncs then panic on this line:

https://github.com/kubernetes/kubernetes/blob/a1b12e49eac237a37939642d0c3395008b9ab380/pkg/kubelet/kubelet.go#L2412

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x3b5f21f]
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).HandlePodAdditions(0xc00073c400, {0xc001816c00?, 0x27, 0x40})
  pkg/kubelet/kubelet.go:2412 +0x4df

_Originally posted by @liggitt in https://github.com/kubernetes/kubernetes/issues/102884#issuecomment-1454089210_

cc @thockin @vinaykul /sig node /kind bug /priority important-soon

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 28 (27 by maintainers)

Commits related to this issue

Most upvoted comments

I was able to repro the panic with standalone mode in node e2e:

E0314 22:40:45.433285    1490 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 385 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3a3c560?, 0x688f790})
    vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
    vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x3a3c560, 0x688f790})
    /go/src/k8s.io/kubernetes/_output/local/.gimme/versions/go1.20.2.linux.amd64/src/runtime/panic.go:884 +0x213
k8s.io/kubernetes/pkg/kubelet/kuberuntime.(*kubeGenericRuntimeManager).doPodResizeAction(0xc000ce0480, 0xc001067200, 0xc000993fb0?, {0x0, 0x0, {0xc000cfe3c0, 0x40}, 0x0, 0x0, {0x690b7e0, ...}, ...}, ...)
    pkg/kubelet/kuberuntime/kuberuntime_manager.go:734 +0x7a5
k8s.io/kubernetes/pkg/kubelet/kuberuntime.(*kubeGenericRuntimeManager).SyncPod(0xc000ce0480, {0x48bf718, 0xc000993fb0}, 0xc001067200, 0xc0010107e0, {0x690b7e0, 0x0, 0x0}, 0xc0003dc500)
    pkg/kubelet/kuberuntime/kuberuntime_manager.go:1223 +0x2d38
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).SyncPod(0xc00053b400, {0xc001001580?, 0x6?}, 0x0, 0xc001067200, 0x0, 0xc0010107e0)
    pkg/kubelet/kubelet.go:1865 +0x25de
k8s.io/kubernetes/pkg/kubelet.(*podWorkers).podWorkerLoop.func1({0x0, {0x0, {0xc0fc5bdb191136ad, 0x27796a748, 0x68dac00}, 0xc001067200, 0x0, 0x0, 0x0}}, 0xc00052e5a0, ...)
    pkg/kubelet/pod_workers.go:1241 +0x1d1
k8s.io/kubernetes/pkg/kubelet.(*podWorkers).podWorkerLoop(0xc001001580?, {0xc001001580, 0x20}, 0x82e71986708e9901?)
    pkg/kubelet/pod_workers.go:1246 +0x48d
k8s.io/kubernetes/pkg/kubelet.(*podWorkers).UpdatePod.func1()
    pkg/kubelet/pod_workers.go:920 +0x125
created by k8s.io/kubernetes/pkg/kubelet.(*podWorkers).UpdatePod
    pkg/kubelet/pod_workers.go:915 +0x2393
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3303365]

I built on top @SergeyKanzhelev standalone PR - https://github.com/kubernetes/kubernetes/pull/116551

Here’s my branch: https://github.com/bobbypage/kubernetes/tree/standaloneMode

I ran the node e2e under standalone mode and enabled all alpha features.

If you mean, when VPA points at a mirror pod can it change spec? No, generally.

From what we can tell, something in the VPA path (computePodResizeAction ) thought a static pod’s resources needed to trigger the resize code path.

I don’t know enough about static pod handling to know if that is a category error and VPA should never expect to resize resources used by static pods. That’s what part of the proposed fix in #116504 does - excludes static pods from being considered by VPA.

For a kubelet running against an API server, would you expect manifest-driven pods to be able to make use of resize functionality? If so, a blanket exclusion of static pods from that code path doesn’t seem correct.

From sig-node discussion just now, it sounds like updating mirror to effect changes to the pod is discouraged. We probably should be more explicit and block in validation if there are no sound reasons to allow updates to static pods via api. With that discussion, and given current state with https://github.com/kubernetes/kubernetes/issues/116597 , my change to exclude static pods for computePodResizeAction and doPodResizeAction is not only defensive but may also be the right fix.

I’ll verify standalone kubelet manually for FG enabled/disabled cases later after my meeting with Clayton.

/cc @derekwaynecarr @dchen1107 @SergeyKanzhelev