kubernetes: Cluster join fails due to insufficient ephemeral-storage
What happened:
kubeadm join on the new node fails sometimes. The following error appears in the log:
Feb 22 14:04:33 NEWHOST kubelet[7331]: W0222 14:04:33.866713 7331 predicate.go:113] Failed to admit pod etcd-NEWHOST_kube-system(13c2d55e88e08af2d946555ab67e9eab) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]
What you expected to happen:
kubeadm join on the new node succeeds every time
How to reproduce it (as minimally and precisely as possible):
Join a control plane node
Anything else we need to know?:
The node has over 40 GB of free space on its root partition.
I added some more logging to the following functions:
pkg/kubelet/cm/container_manager_linux.go,Startpkg/kubelet/lifecycle/predicate.go,GeneralPredicatespkg/kubelet/nodestatus/setters.go,MachineInfopkg/kubelet/nodestatus/setters.go,ReadyConditionpkg/kubelet/preemption/preemption.go,HandleAdmissionFailurepkg/scheduler/framework/plugins/noderesources/fit.go,Filter
It appears that the ephemeral storage is detected a bit too late:
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599153 7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599202 7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599333 7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599357 7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666348 7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666413 7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666425 7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666435 7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673445 7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673490 7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673540 7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673563 7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692186 7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692250 7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692306 7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692325 7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743289 7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743339 7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743349 7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743362 7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.752328 7331 container_manager_linux.go:613] TRACE containerManagerImpl.Start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.752381 7331 container_manager_linux.go:625] TRACE containerManagerImpl.Start, map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772730 7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772773 7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772780 7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772786 7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826100 7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826146 7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826353 7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826414 7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:46245666740 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846712 7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846749 7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846845 7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846854 7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846892 7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846940 7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847005 7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847014 7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847072 7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847080 7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:250 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847160 7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847177 7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:450 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866562 7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866598 7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:550 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866620 7331 preemption.go:67] TRACE cannot admit etcd-NEWHOST, [Node didn't have enough resource: ephemeral-storage, requested: 104857600, used: 0, capacity: 0]
Feb 22 14:04:33 NEWHOST kubelet[7331]: W0222 14:04:33.866713 7331 predicate.go:113] Failed to admit pod etcd-NEWHOST_kube-system(13c2d55e88e08af2d946555ab67e9eab) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905188 7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905231 7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905239 7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905248 7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789009 7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789081 7331 setters.go:334] TRACE MachineInfo setting capacity to map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789183 7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789234 7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:46245666740 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
/sig node storage
Environment:
- Kubernetes version (use
kubectl version): 1.20.4, commit e87da0bd6e03ec3fea7933c4b5263d151aafd07c - Kernel (e.g.
uname -a): 4.9.252-1.ph2
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 6
- Comments: 57 (38 by maintainers)
Commits related to this issue
- Format json; Fix wrong docker config; Fix bug: https://github.com/kubernetes/kubernetes/issues/99305 Signed-off-by: huaiyou <huaiyou.cyz@alibaba-inc.com> — committed to VinceCui/sealer by VinceCui 2 years ago
- Support host alias and some other feats, fix some bugs. (#1875) * Hack: wait for cert config be reloaded to APIServer. Signed-off-by: huaiyou <huaiyou.cyz@alibaba-inc.com> * Support skip clean ... — committed to sealerio/sealer by VinceCui 2 years ago
For anyone who is experiencing this issue and still wants to use kubeadm until it is fixed: You can remove the ephemeral-storage request from the etcd static pod and the join will work. Just make sure that your nodes actually have more than 100MB free storage left. You can reinsert the storage request by patching the etcd manifests after the nodes are up.
To patch etcd during a
kubeadm joincreate the following file (the filename matters, the file location doesn’t)and use
kubeadm join --experimental-patches /path/to/any/folder --token ...to join additional control plane nodes.based on the investigations here (thanks all), it feels like
ephemeral_storageis a feature that may or may not work (for static pods at least).it doesn’t feel right to mitigate a kubelet bug on the side of the deployer, but given how many reports there are about this i think it should be done at this point.
happy to LGTM approve your PRs. note it must also be backported to all versions in support (back to 1.19).
suggesting the following release note:
we should keep this issue open, too.
I have identified where these particular symptoms (probably) originate, and cursory testing suggests something like this eliminates the issue:
https://github.com/kubernetes/kubernetes/pull/101710/files#diff-1046a664dcd001985cf5a688c28959d4bafb964343b0d9faf3ed88e1a14c801bR69
I’ll be advocating for a few mins in today’s sig-node discussion if any folks following this thread want to chime in.
cc @pacoxu
Hi,
Same problem here. I’m using K8s v1.20.4 with CentOS7 (2 cores, 4GB and 20GB storage)
Cluster creation with kubeadm init always works, but when I try to add several master nodes with kubeadm join it takes a lot of time waiting for etcd pod to be created. In most of cases, etcd pod never runs. The rest of components (APIserver, controller, scheduler and proxy) exists and works (except APIServer which reboots trying to connect the missing etcd)
Checking the kubelet logs in the missing etcd hosts I have found this:
Mar 10 14:23:27 kmaster-02 kubelet[9448]: W0310 14:23:27.586165 9448 predicate.go:113] Failed to admit pod etcd-kmaster-02_kube-system(a9bd496345f6709b747bfbcf4a8cac0d) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]That node have 17GB storage free.
If kubelet is restarted, the etcd (pod and container in docker ps -a) appears.
@pacoxu If I could help doing something please tell me. I would like to help
I ran into this issue consistently on Ubuntu 20.04.2 LTS using cri-o but
/sys/fs/cgroup/cgroup.controllersis not present on my systems.We need to figure out what triggers the race condition. This only happens on some machines. I have not seen this myself.