kubernetes: Static pods never start w Kubelet v1.19.0-alpha.1/beta.2 on OSes with SMT disabled

What happened:

Kubelet’s between v1.19.0-alpha.1 and v1.19.0-beta.1 (latest at time of writing) cannot start static pod manifests defined in --pod-manifest-path=/etc/kubernetes/manifests. Rolling back to v1.18.3 (or building v1.19.0-alpha.0) restores the ability to create static pods. Observed on Fedora CoreOS nodes, but no on Flatcar Linux nodes.

With Kubelet -v=10, this message looks suspect:

Reading config file "/etc/kubernetes/manifests/kube-apiserver.yaml"
Generated UID "3cea85470d942aa9e23a9df789f659d8" pod "kube-apiserver" from /etc/kubernetes/manifests/kube-apiserver.yaml
Generated Name "kube-apiserver-ip-10-0-14-234" for UID "3cea85470d942aa9e23a9df789f659d8" from URL /etc/kubernetes/manifests/kube-apiserver.yaml
Using namespace "kube-system" for pod "kube-apiserver-ip-10-0-14-234" from /etc/kubernetes/manifests/kube-apiserver.yaml
Receiving a new pod "kube-apiserver-ip-10-0-14-234_kube-system(3cea85470d942aa9e23a9df789f659d8)"
Write status for kube-apiserver-ip-10-0-14-234/kube-system: &container.PodStatus{ID:"3cea85470d942aa9e23a9df789f659d8", Name:"kube-apiserver-ip-10-0-14-234", Namespace:"kube-system", IPs:[]string{}, ContainerStatuses:[]*container.ContainerStatus{(*container.ContainerStatus)(0xc000ca42a0)}, SandboxStatuses:[]*v1alpha2.PodSandboxStatus{(*v1alpha2.PodSandboxStatus)(0xc00058ec60)}} (err: <nil>)
Failed to admit pod kube-apiserver-ip-10-0-14-234_kube-system(3cea85470d942aa9e23a9df789f659d8) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: cpu, q: 150), ]

no set of running pods found to reclaim resources: [(res: cpu, q: 150), ]

What you expected to happen:

Kublet should create static pods as containers with the Docker runtime (sudo docker ps).

How to reproduce it (as minimally and precisely as possible):

Run Kubelet on Fedora CoreOS with pod-manifest-path manifests, using the default Docker runtime. Check docker ps -a to see no containers are created.

Anything else we need to know?:

Rolling Kubelet back to v1.18.3 immediately allows static pod manifests to be created (e.g. same host, no other changes), hinting this is a Kubelet regression. Fedora CoreOS nodes are consistently affected, while Flatcar Linux nodes are not. To me this hints the issue relates in some way to interactions/assumptions about the host.

Binary searching and building Kubelets reveals the issue began in https://github.com/kubernetes/kubernetes/pull/86975

BAD v1.19.0-alpha.1 and beyond
BAD 7555985346c48b20d2b6662ebbce93827b513be2
BAD 54967fe39367c1ada4c9c4b5c2146263f85a41e4
BAD 3e43b0722a0812c7d333a4557a4c09c32e2d86c3
BAD 4274ea2c89dee24e4c188a71e8164b2a40d1e181
OK a6d0f8e3dc33d897f0fa6cc6ec325a2c333b5bda
OK d00f9c7c1091e31c75c6636500095c4e490b8db8
OK a1ae67d691d514d859fce68299d7bd3830686b38
OK v1.19.0-alpha.0

Environment:

Kubernetes version (use kubectl version): v1.19.0-alpha.1 to v1.19.0-beta.1
Cloud provider or hardware configuration: Any platform / NA
OS (e.g: cat /etc/os-release): Fedora CoreOS 31.20200517.3.0
Kernel (e.g. uname -a): Linux ip-10-0-14-234 5.6.11-200.fc31.x86_64
Install tools: Typhoon

So what actually differs between the Fedora CoreOS and Flatcar Linux hosts that’s plausibly relevant here.

Name	Kernel	Docker	driver	problem
Fedora CoreOS 31.20200517.3	5.6.11-200	18.09.8	systemd	yes
Flatcar Linux 2512.2.0	4.19.124-flatcar	18.06.3-ce	cgroupfs	no

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 3
Comments: 22 (17 by maintainers)

Most upvoted comments

@dghubble Thank you for information, I’ll try to provide a fix for this case today.

katarzyna-z on Jun 5, 2020

Ah, looks much better. @iwankgb thanks!

  "num_cores": 1,                                                                                                                                                                             
  "num_physical_cores": 1,                                                                                                                                                                    
  "num_sockets": 1,

lscpu -e -a
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
  0    0      0    0 0:0:0:0          yes
  1    -      -    - :::               no

dghubble on Jun 11, 2020

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
stepping        : 7
microcode       : 0x5002f00
cpu MHz         : 2499.998
cache size      : 36608 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 4999.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

AWS t3.small

dghubble on Jun 5, 2020

I have the same failure mode w/ for processor 10 on my corp (“gLinux” ~= debian testing) workstation with the patch.

On Thu, Jun 4, 2020 at 9:01 PM Dalton Hubble notifications@github.com wrote:

Actually, it may be simpler. That log line shows the value of cpuinfo is : 0, but reading over the PR, it looks like its intended to be the entire content of /proc/cpuinfo, that’s getting lost somehow.

E0605 03:32:16.699800 188005 info.go:109] Failed to get topology information: Unable to read core id for processor 1 from processor : 0

return “”, fmt.Errorf(“Unable to read core id for processor %d from %s”, cpuID, cpuinfo)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/91795#issuecomment-639244826, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHADKZW2SNCTN72NRUKL7LRVBU2TANCNFSM4NTDJQ7A .

BenTheElder on Jun 5, 2020

Great context over there, thanks! Here, I’m not able to kubectl get nodes since this prevents nodes ever registering to kube-apiserver (which doesn’t come up since its a static pod).

But it does seem cpu1 is missing the topology directory, which cAdvisor seems to now want according to this comment.

Fedora CoreOS (cpu1 missing topology)

ls /sys/devices/system/node/node0/cpu0
cache  crash_notes  crash_notes_size  driver  firmware_node  hotplug  node0  power  subsystem  topology  uevent
ls /sys/devices/system/node/node0/cpu1
crash_notes  crash_notes_size  driver  firmware_node  hotplug  node0  online  power  subsystem  uevent

Flatcar Linux (ok)

ls /sys/devices/system/node/node0/cpu0/
cache        crash_notes_size  firmware_node  node0  subsystem  uevent crash_notes  driver            hotplug        power  topology
ls /sys/devices/system/node/node0/cpu1/
cache        crash_notes_size  firmware_node  node0   power      topology crash_notes  driver            hotplug        online  subsystem  uevent

Loooks like https://github.com/google/cadvisor/pull/2567 is a candidate change to cAdvisor. I can test it somewhat crudely (running it on-host, not quite how its really used, but better than nothing) on the affected Fedora CoreOS node.

./cadvisor 
W0605 03:32:16.697778  188005 nvidia.go:55] NVidia GPU metrics will not be available: no NVIDIA devices found
E0605 03:32:16.699800  188005 info.go:109] Failed to get topology information: Unable to read core id for processor 1 from processor   : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping        : 4
microcode       : 0x2000069
cpu MHz         : 2500.000
cache size      : 33792 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
....

curl 127.0.0.1:8080/api/v2.0/machine | jq .
{                                                                                                                                                                                             
  "timestamp": "2020-06-05T03:32:16.701871119Z",                                                                                                                                              
  "num_cores": 0,                                                                                                                                                                             
  "num_physical_cores": 1,                                                                                                                                                                    
  "num_sockets": 1,                                                                                                                                                                           
  "cpu_frequency_khz": 2500000,                                                                                                                                                               
  "memory_capacity": 2035654656,                                                                                                                                                              
  "memory_by_type": {},                                                                                                                                                                       
   ....
   "topology": null,
}

dghubble on Jun 5, 2020