kubernetes: cpumanager e2e tests contain too many assumption about cpu allocation

What happened:

the cpumanager e2e tests contain too many assumption about cpu allocation.

TL;DR: cpumanager code not broken; tests not broken; we should improve to avoid future possible false negatives.

The specific case we encountered is: if socket id doesn’t match numa cell id, like in this example:

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   40 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           44
Model name:                      Intel(R) Xeon(R) CPU           L5640  @ 2.27GHz
Stepping:                        2
Frequency boost:                 enabled
CPU MHz:                         1595.957
CPU max MHz:                     2262.0000
CPU min MHz:                     1596.0000
BogoMIPS:                        4521.94
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        3 MiB
L3 cache:                        24 MiB
NUMA node0 CPU(s):               0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):               1,3,5,7,9,11,13,15,17,19,21,23
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
                                  ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
$ pwd
/sys/devices/system/cpu/cpu0/topology
$ cat core_siblings_list 
0,2,4,6,8,10,12,14,16,18,20,22
$ cat package_cpus_list 
0,2,4,6,8,10,12,14,16,18,20,22
$ cat physical_package_id 
1  ## <===== NOTE THIS
$ cat /sys/devices/system/node/node0/cpulist  ## <===== NOTE THIS
0,2,4,6,8,10,12,14,16,18,20,22

In this case, cpumanager comes up with this core ordering, so the reserved cpu is expected to be 1: [1 13 3 15 5 17 7 19 9 21 11 23 0 12 2 14 4 16 6 18 8 20 10 22]

The tests expect this core ordering, so the reserved cpu is expected to be 0: [0 12 2 14 4 16 6 18 8 20 10 22 1 13 3 15 5 17 7 19 9 21 11 23]

Now: in this case this is very old hardware noone really cares about anymore: the test run fine on more modern hardware on which this is true

physical_package_id == numa_cell_id

Still this higlights a unnecessary assumption in the e2e cpu manager tests which is worth removing.

What you expected to happen:

Tests should not make assumptions about the cpu core ordering, which in turn drives the reserved core id

How to reproduce it (as minimally and precisely as possible):

run the cpumanager e2e tests on any multi-numa machine on which physical_package_id != numa_cell_id like xeons L5640

Anything else we need to know?:

This issue will NOT materialize on prow until we will have multi-NUMA machines - or in general until we will have machines with 2+ physical_package_id

Environment:

Kubernetes version (use kubectl version): any, including master
Cloud provider or hardware configuration: see above
OS (e.g: cat /etc/os-release): any (depends on kernel)
Kernel (e.g. uname -a): at very least 4.y, probably earlier
Install tools: N/A
Network plugin and version (if this is a network-related bug): N/A
Others: N/A

About this issue

Original URL
State: open
Created 3 years ago
Comments: 19 (13 by maintainers)

Most upvoted comments

/cc @swatisehgal

ffromani on Jul 5, 2023

/assign @swatisehgal

ffromani on Jun 24, 2021