kubernetes: Some static pods fail to start on K8S 1.22 and 1.23
What happened?
In Kubernetes 1.23 we are seeing some static pods fail to start. These static pods failures are on quick remove and adds leaving the static pod not started.
We need to revert kubernetes/kubernetes#104743.
cc @gjkim42
What did you expect to happen?
All static pods should restart.
How can we reproduce it (as minimally and precisely as possible)?
diff --git a/pkg/kubelet/pod_workers_test.go b/pkg/kubelet/pod_workers_test.go
index 4028c06c292..0e852594a40 100644
--- a/pkg/kubelet/pod_workers_test.go
+++ b/pkg/kubelet/pod_workers_test.go
@@ -880,3 +880,34 @@ func Test_allowPodStart(t *testing.T) {
})
}
}
+
+func TestUpdatePodWithQuickAddRemoveStaticPod(t *testing.T) {
+ podWorkers, _ := createPodWorkers()
+ staticPodA := newStaticPod("0000-0000-0000", "running-static-pod")
+ staticPodB := newStaticPod("0000-0000-0000", "running-static-pod")
+
+ podWorkers.UpdatePod(UpdatePodOptions{
+ Pod: staticPodA,
+ UpdateType: kubetypes.SyncPodCreate,
+ })
+
+ podWorkers.UpdatePod(UpdatePodOptions{
+ Pod: staticPodA,
+ UpdateType: kubetypes.SyncPodKill,
+ })
+
+ drainAllWorkers(podWorkers)
+
+ podWorkers.UpdatePod(UpdatePodOptions{
+ Pod: staticPodB,
+ UpdateType: kubetypes.SyncPodCreate,
+ })
+
+ drainAllWorkers(podWorkers)
+
+ t.Logf("rphillips waitingToStartStaticPodsByFullName=%v", podWorkers.waitingToStartStaticPodsByFullname)
+ if status, ok := podWorkers.podSyncStatuses["0000-0000-0000"]; ok {
+ t.Logf("rphillips podSyncStatuses=%+v", status)
+ }
+ t.Logf("rphillips startedStaticPodsByFullname=%v", podWorkers.startedStaticPodsByFullname)
+}
Anything else we need to know?
No response
Kubernetes version
1.23
Cloud provider
All
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (22 by maintainers)
@rphillips
Let me align all PRs to be merged to fix this issue.
Thanks to @TeddyAndrieux. I can reproduce this bug quite easily.
.spec.resources.requests.cpu
)Then, the static pod with the same name never starts again.
It was a bit tricky but I manage to reproduce by deploying a single-node and then I edit a first time the kube-apiserver static pod manifest and wait for the pod to be deleted, and once it’s deleted and not yet re-created I edit the manifest a second time then if you are (un)lucky the pod will never restart
Not a super easy way to reproduce, I agree 😄
Using this “method” I manage to reproduce several time with kubelet 1.22.5 and when I downgraded to 1.22.4 I didn’t.
@TeddyAndrieux my mistake, I just saw the 1.22 backport at https://github.com/kubernetes/kubernetes/pull/106394 - you are correct. Was going to drop a line here but you beat me to it 😃
@gjkim42 We’ll merge the one liner in our CI and see if it fixes the issue.