kubernetes: Some static pods fail to start

What happened?

if kubelet start with kube-apiserver not ready，during this period，someone remove static pod files，and kube-apiserver recovery，someone update static pod files, move to dir, find kubelet not restart this static pods

What did you expect to happen?

All static pods should restart.

How can we reproduce it (as minimally and precisely as possible)?

add a static pod yaml to kubelet manifest dir, and wait pod start
kubelet api source not ready(apiserver down or network problem)
restart kubelet
remove this static pod yaml, wait kubelet print skipping delete because sources aren't ready yet
recover kube-apiserver
update this pod yaml(update image vesion) , move updated pod yaml to kubelet manifest dir

kubelet will not restart new pods

Anything else we need to know?

we find that podWorker managePodLoop will check whether pod allow start;

// allowPodStart tries to start the pod and returns true if allowed, otherwise
// it requeues the pod and returns false. If the pod will never be able to start
// because data is missing, or the pod was terminated before start, canEverStart
// is false.
func (p *podWorkers) allowPodStart(pod *v1.Pod) (canStart bool, canEverStart bool) {
	if !kubetypes.IsStaticPod(pod) {
		// TODO: Do we want to allow non-static pods with the same full name?
		// Note that it may disable the force deletion of pods.
		return true, true
	}
	p.podLock.Lock()
	defer p.podLock.Unlock()
	status, ok := p.podSyncStatuses[pod.UID]
	if !ok {
		klog.ErrorS(nil, "Pod sync status does not exist, the worker should not be running", "pod", klog.KObj(pod), "podUID", pod.UID)
		return false, false
	}
	if status.IsTerminationRequested() {
		return false, false
	}
	if !p.allowStaticPodStart(status.fullname, pod.UID) {
		p.workQueue.Enqueue(pod.UID, wait.Jitter(p.backOffPeriod, workerBackOffPeriodJitterFactor))
		status.working = false
		return false, true
	}
	return true, true
}

// allowStaticPodStart tries to start the static pod and returns true if
// 1. there are no other started static pods with the same fullname
// 2. the uid matches that of the first valid static pod waiting to start
func (p *podWorkers) allowStaticPodStart(fullname string, uid types.UID) bool {
	startedUID, started := p.startedStaticPodsByFullname[fullname]
	if started {
		return startedUID == uid
	}

	waitingPods := p.waitingToStartStaticPodsByFullname[fullname]
	// TODO: This is O(N) with respect to the number of updates to static pods
	// with overlapping full names, and ideally would be O(1).
	for i, waitingUID := range waitingPods {
		// has pod already terminated or been deleted?
		status, ok := p.podSyncStatuses[waitingUID]
		if !ok || status.IsTerminationRequested() || status.IsTerminated() {
			continue
		}
		// another pod is next in line
		if waitingUID != uid {
			p.waitingToStartStaticPodsByFullname[fullname] = waitingPods[i:]
			return false
		}
		// we are up next, remove ourselves
		waitingPods = waitingPods[i+1:]
		break
	}
	if len(waitingPods) != 0 {
		p.waitingToStartStaticPodsByFullname[fullname] = waitingPods
	} else {
		delete(p.waitingToStartStaticPodsByFullname, fullname)
	}
	p.startedStaticPodsByFullname[fullname] = uid
	return true
}

kubelet will skip handle pod remove event when apiserver config source not ready; so pod worker not marked this pod as a Termination Requested, so not remove uid from startedStaticPodsByFullname;

when apiserver recovers，move an updated pod manifest, uid changes, so allowStaticPodStart return false, lead podworker not to do sync work.

/sig node @rphillips @gjkim42 @smarterclayton

Kubernetes version

$ kubectl version
# paste output here

1.25

Cloud provider

none

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

centos 7

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

Original URL
State: closed
Created a year ago
Comments: 17 (17 by maintainers)

Most upvoted comments

I tried the same steps on v1.26.2 and was able to repro (nginx was stuck and not terminated).

This is likely fixed due to https://github.com/kubernetes/kubernetes/pull/113145 which was merged as part of v1.27 cycle because it properly handles terminating orphaned pods.

bobbypage on Mar 28, 2023

mv nginx.yaml file to manifest dir, wait for pod ready

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: default
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx
    resources: {}
  dnsPolicy: ClusterFirst
  priority: 0
  restartPolicy: Always
  hostNetwork: true
  terminationGracePeriodSeconds: 30

[root@test-nodepool-30105-f3gxz ~]# kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
nginx-192.168.0.233   1/1     Running   0          2m27s

add a reject iptables , 192.168.0.146 is master address

iptables -A OUTPUT -d 192.168.0.146 -j REJECT

restart kubelet
remove nginx.yaml from dir
delete reject rules

iptables -D OUTPUT 2

now use crictl ps, container exist

[root@test-nodepool-30105-f3gxz ~]# crictl ps
CONTAINER           IMAGE               CREATED              STATE               NAME                 ATTEMPT             POD ID
0870acc42f8cc       992e3b7be0465       4 minutes ago        Running             nginx                2                   f58e4cc4486a0

change image to tomcat, move updated nginx.yaml file to manifest dir,

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: default
spec:
  containers:
  - image: tomcat
    imagePullPolicy: Always
    name: nginx
    resources: {}
  dnsPolicy: ClusterFirst
  priority: 0
  restartPolicy: Always
  hostNetwork: true
  terminationGracePeriodSeconds: 30

watch containers status, pod never restart, and change images to tomcat

use master commit f0791b50143856177878e21bb44beb5e3e36cc78 to reproduce this issue

Dingshujie on Mar 25, 2023