kubernetes: Static pods stopped first when priority is not explicitly set
What happened?
etcd / kube-apiserver stops before most containers, despite using shutdownGracePeriodByPodPriority.
On a single node server, with multus (and likely other CNI talking to the API), this spam the logs and make the shutdown slower
What did you expect to happen?
shutdownGracePeriodByPodPriority should stop one group at a time
How can we reproduce it (as minimally and precisely as possible)?
On a single node cluster, have a slow to shutdown pod (sleep inf), look at the logs to see in which order they are killed
Anything else we need to know?
shutdownGracePeriodByPodPriority:
- priority: 2000000001
shutdownGracePeriodSeconds: 10
- priority: 2000000000
shutdownGracePeriodSeconds: 10
- priority: 0
shutdownGracePeriodSeconds: 60
Reading the kep https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2712-pod-priority-based-graceful-node-shutdown, the summary says Kubelet graceful shutdown should take the pod priority values into account to determine the order in which the pods are stopped. so I would expect all pods with priority between 0 and 2000000000 to be stopped first, then coredns (2000000000), then the rest.
This is definitely not what’s happening:
Nov 15 21:02:02 atsc2 kubelet[9573]: I1115 21:02:02.083004 9573 nodeshutdown_manager_linux.go:134] "Creating node shutdown manager" shutdownGracePeriodRequested="0s" shutdownGracePeriodCriticalPods="0s" shutdownGracePeriodByPodPriority=[{Priority:0 ShutdownGracePeriodSeconds:60} {Priority:2000000000 ShutdownGracePeriodSeconds:10} {Priority:2000000001 ShutdownGracePeriodSeconds:10}]
...
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.372579 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="kube-system/kube-scheduler-atemeappliance" podUID=81872379106beaec249553e0efae9ec6 containerName="kube-scheduler" containerID="containerd://282605472eed19f2d131d9805b6616682c677af2ab281be0763d55bdb7a7bad8" gracePeriod=30
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.372949 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="kube-system/kube-controller-manager-atemeappliance" podUID=ec037a7f739a4afa72ccb6799fceb193 containerName="kube-controller-manager" containerID="containerd://77d0c10f17b4a95aa30b11615e838868fbab6a9c53f650bc75d8ed94ff6f8173" gracePeriod=30
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.373028 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="kube-system/etcd-atemeappliance" podUID=a42b112129999807250e3e1cb281cd6c containerName="etcd" containerID="containerd://9774566e9a53918fb4648cdd7413b7ebc2eee4ac2cbdc09d33c2c5bea9874761" gracePeriod=30
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.373100 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="kube-system/kube-apiserver-atemeappliance" podUID=56ed88c0ff350d8516fd285311afe657 containerName="kube-apiserver" containerID="containerd://c93447aa665d04e1e66784333b2abe2231c96ef465f8c6fec51d8537cbec75fa" gracePeriod=30
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.373388 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="kube-system/kube-sriov-device-plugin-dds5b" podUID=310293cf-ee11-4fa4-acad-0b87559e3836 containerName="kube-sriovdp" containerID="containerd://9ec25c43fb5c1c017f1d5576fb6feb42a703d872813d84b700d613b99fa6c419" gracePeriod=30
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.373509 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="default/REDACTED" podUID=1d1880ee-6b18-42e0-810a-b12100763fd5 containerName="REDACTED" containerID="containerd://141e1363c2e3f7db4e6b5dcbc51ac6d8975cb244435659f20c0d8309107a1b8f" gracePeriod=30
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.560894 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="default/REDACTED" podUID=3597a4c8-32a8-4089-bbd3-77c024f45dbe containerName="REDACTED" containerID="containerd://1d29a04cd823e549b6d1bbb58dc6afc2d8ceb99f0b8631796378d8576bb44c96" gracePeriod=45
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.565408 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="default/REDACTED" podUID=0c3d38c6-12f7-4253-9f54-a31b5c2d01d5 containerName="REDACTED" containerID="containerd://40fec73c8db538ed30e366853a1b83c3dfde2e7ced25d8bb681271d3950a1a8b" gracePeriod=30
Nov 15 21:06:09 atsc2 kubelet[9573]: I1115 21:06:09.574084 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="default/REDACTED" podUID=db2e44db-16dc-419a-8799-2b4cadb08063 containerName="REDACTED" containerID="containerd://a178800ba300018b202434017efda11d84e55fc32c148304769b8ef79e291ae7" gracePeriod=30
Nov 15 21:06:12 atsc2 kubelet[9573]: I1115 21:06:12.611940 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="default/REDACTED" podUID=a70a245b-fffc-4b5f-b25e-ecb71ea09c35 containerName="REDACTED" containerID="containerd://a21c392ec48551915b807380d3084e252057e30fef501b21a65778202f4149d4" gracePeriod=30
Nov 15 21:06:20 atsc2 kubelet[9573]: I1115 21:06:20.549014 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="ingress-nginx/ingress-nginx-controller-85cbcdf4dd-kbpr7" podUID=8b408d5a-d723-4869-93e5-2436a4ce891a containerName="controller" containerID="containerd://bd3b70c0db023eddadc65cdf24eb18202414751a4996fdd0fbdc9b6ef9639665" gracePeriod=20
Nov 15 21:07:10 atsc2 kubelet[9573]: I1115 21:07:10.138212 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="kube-system/coredns-5dcd989fd8-drjsv" podUID=1132caa1-9550-407c-8b75-d5c02e750269 containerName="coredns" containerID="containerd://6f9cfc43cfb0162d5258c16c4ee7bc1801135257b1a4a580687d7af26488f5b3" gracePeriod=10
Nov 15 21:07:19 atsc2 kubelet[9573]: I1115 21:07:19.373437 9573 kuberuntime_container.go:722] "Killing container with a grace period" pod="kube-system/kube-proxy-h2q9q" podUID=b17d6ff3-eefa-4012-8479-a0c06c47b1c6 containerName="kube-proxy" containerID="containerd://cc0d5fd4d420bc3292296d4505fbcccbf48388172c3158bfb69053eaacee9ee1" gracePeriod=10
...
Nov 15 21:07:18 atsc2 kubelet[9573]: E1115 21:07:18.191640 9573 kuberuntime_manager.go:999] "Failed to stop sandbox" podSandboxID={Type:containerd ID:791fda883bf2aa8b0ebc6381e8b7d7182e280bc2a83df112fbeb19c32062e61a}
Nov 15 21:07:18 atsc2 kubelet[9573]: E1115 21:07:18.191669 9573 kubelet.go:1784] failed to "KillPodSandbox" for "0c3d38c6-12f7-4253-9f54-a31b5c2d01d5" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"791fda883bf2aa8b0ebc6381e8b7d7182e280bc2a83df112fbeb19c32062e61a\": plugin type=\"multus-cni\" name=\"multus-cni-network\" failed (delete): Multus: [default/REDACTED]: error getting pod with error: Get \"https://198.19.254.254:6443/api/v1/namespaces/default/pods/REDACTED?timeout=1m0s\": dial tcp 198.19.254.254:6443: connect: connection refused"
Nov 15 21:07:18 atsc2 kubelet[9573]: E1115 21:07:18.191692 9573 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"0c3d38c6-12f7-4253-9f54-a31b5c2d01d5\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"791fda883bf2aa8b0ebc6381e8b7d7182e280bc2a83df112fbeb19c32062e61a\\\": plugin type=\\\"multus-cni\\\" name=\\\"multus-cni-network\\\" failed (delete): Multus: [default/REDACTED]: error getting pod with error: Get \\\"https://198.19.254.254:6443/api/v1/namespaces/default/pods/REDACTED?timeout=1m0s\\\": dial tcp 198.19.254.254:6443: connect: connection refused\"" pod="default/REDACTED" podUID=0c3d38c6-12f7-4253-9f54-a31b5c2d01d5
Kubernetes version
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.6", GitCommit:"b39bf148cd654599a52e867485c02c4f9d28b312", GitTreeState:"clean", BuildDate:"2022-09-21T13:19:24Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.6", GitCommit:"b39bf148cd654599a52e867485c02c4f9d28b312", GitTreeState:"clean", BuildDate:"2022-09-21T13:12:04Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
N/A
OS version
# On Linux:
$ cat /etc/os-release
Appliance based on Alma Linux 8.6
$ uname -a
... 4.18.0-372.32.1.el8_6.x86_64 ...
Install tools
kubeadm
Container runtime (CRI) and version (if applicable)
containerd
Related plugins (CNI, CSI, …) and versions (if applicable)
multus
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (17 by maintainers)
https://github.com/kubernetes/kubernetes/blob/4db6bde859d912e11ab081a06e8ea17b2e044f5d/pkg/apis/core/types.go#L2969-L2973
Each
shutdownGracePeriodByPodPrioritywill match all Pods whosepriorityis less than or equal to it.This may suit your want
https://github.com/kubernetes/kubernetes/blob/3f823c0daa002158b12bfb2d53bcfe433516659d/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux_test.go#L646-L716