kubernetes: Cluster join fails due to insufficient ephemeral-storage

What happened:

kubeadm join on the new node fails sometimes. The following error appears in the log:

Feb 22 14:04:33 NEWHOST kubelet[7331]: W0222 14:04:33.866713    7331 predicate.go:113] Failed to admit pod etcd-NEWHOST_kube-system(13c2d55e88e08af2d946555ab67e9eab) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]

What you expected to happen:

kubeadm join on the new node succeeds every time

How to reproduce it (as minimally and precisely as possible):

Join a control plane node

Anything else we need to know?:

The node has over 40 GB of free space on its root partition.

I added some more logging to the following functions:

  • pkg/kubelet/cm/container_manager_linux.go, Start
  • pkg/kubelet/lifecycle/predicate.go, GeneralPredicates
  • pkg/kubelet/nodestatus/setters.go, MachineInfo
  • pkg/kubelet/nodestatus/setters.go, ReadyCondition
  • pkg/kubelet/preemption/preemption.go, HandleAdmissionFailure
  • pkg/scheduler/framework/plugins/noderesources/fit.go, Filter

It appears that the ephemeral storage is detected a bit too late:

Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599153    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599202    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599333    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599357    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666348    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666413    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666425    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666435    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673445    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673490    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673540    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673563    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692186    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692250    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692306    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692325    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743289    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743339    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743349    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743362    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.752328    7331 container_manager_linux.go:613] TRACE containerManagerImpl.Start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.752381    7331 container_manager_linux.go:625] TRACE containerManagerImpl.Start, map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772730    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772773    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772780    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772786    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826100    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826146    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826353    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826414    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:46245666740 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846712    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846749    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846845    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846854    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846892    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846940    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847005    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847014    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847072    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847080    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:250 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847160    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847177    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:450 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866562    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866598    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:550 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866620    7331 preemption.go:67] TRACE cannot admit etcd-NEWHOST, [Node didn't have enough resource: ephemeral-storage, requested: 104857600, used: 0, capacity: 0]
Feb 22 14:04:33 NEWHOST kubelet[7331]: W0222 14:04:33.866713    7331 predicate.go:113] Failed to admit pod etcd-NEWHOST_kube-system(13c2d55e88e08af2d946555ab67e9eab) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905188    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905231    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905239    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905248    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789009    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789081    7331 setters.go:334] TRACE MachineInfo setting capacity to map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789183    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789234    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:46245666740 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]

/sig node storage

Environment:

  • Kubernetes version (use kubectl version): 1.20.4, commit e87da0bd6e03ec3fea7933c4b5263d151aafd07c
  • Kernel (e.g. uname -a): 4.9.252-1.ph2

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 6
  • Comments: 57 (38 by maintainers)

Commits related to this issue

Most upvoted comments

For anyone who is experiencing this issue and still wants to use kubeadm until it is fixed: You can remove the ephemeral-storage request from the etcd static pod and the join will work. Just make sure that your nodes actually have more than 100MB free storage left. You can reinsert the storage request by patching the etcd manifests after the nodes are up.

To patch etcd during a kubeadm join create the following file (the filename matters, the file location doesn’t)

#/path/to/any/folder/etcd0+json.yaml
- op: remove
  path: '/spec/containers/0/resources/requests/ephemeral-storage'

and use kubeadm join --experimental-patches /path/to/any/folder --token ... to join additional control plane nodes.

based on the investigations here (thanks all), it feels like ephemeral_storage is a feature that may or may not work (for static pods at least).

it doesn’t feel right to mitigate a kubelet bug on the side of the deployer, but given how many reports there are about this i think it should be done at this point.

happy to LGTM approve your PRs. note it must also be backported to all versions in support (back to 1.19).

suggesting the following release note:

kubeadm: remove the "ephemeral_storage" request from the etcd static pod that kubeadm deploys on stacked etcd control plane nodes. This request has caused sporadic failures on some setups due to a problem in the kubelet with cadvizor and the LocalStorageCapacityIsolation feature gate. See this issue for more details: https://github.com/kubernetes/kubernetes/issues/99305

we should keep this issue open, too.

I have identified where these particular symptoms (probably) originate, and cursory testing suggests something like this eliminates the issue:

https://github.com/kubernetes/kubernetes/pull/101710/files#diff-1046a664dcd001985cf5a688c28959d4bafb964343b0d9faf3ed88e1a14c801bR69

I’ll be advocating for a few mins in today’s sig-node discussion if any folks following this thread want to chime in.

cc @pacoxu

Hi,

Same problem here. I’m using K8s v1.20.4 with CentOS7 (2 cores, 4GB and 20GB storage)

Cluster creation with kubeadm init always works, but when I try to add several master nodes with kubeadm join it takes a lot of time waiting for etcd pod to be created. In most of cases, etcd pod never runs. The rest of components (APIserver, controller, scheduler and proxy) exists and works (except APIServer which reboots trying to connect the missing etcd)

Checking the kubelet logs in the missing etcd hosts I have found this:

Mar 10 14:23:27 kmaster-02 kubelet[9448]: W0310 14:23:27.586165 9448 predicate.go:113] Failed to admit pod etcd-kmaster-02_kube-system(a9bd496345f6709b747bfbcf4a8cac0d) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]

That node have 17GB storage free.

If kubelet is restarted, the etcd (pod and container in docker ps -a) appears.

@pacoxu If I could help doing something please tell me. I would like to help

I ran into this issue consistently on Ubuntu 20.04.2 LTS using cri-o but /sys/fs/cgroup/cgroup.controllers is not present on my systems.

We need to figure out what triggers the race condition. This only happens on some machines. I have not seen this myself.