kubernetes: Tests on sig-node-containerd#cos-cgroupv2-containerd-node-e2e-serial are failing
Which jobs are failing?
sig-node-containerd#cos-cgroupv2-containerd-node-e2e-serial
Which tests are failing?
E2eNode Suite.[sig-node] Density [Serial] [Slow] create a batch of pods latency/resource should be within limit when create 10 pods with 0s intervalE2eNode Suite.[sig-node] Density [Serial] [Slow] create a sequence of pods latency/resource should be within limit when create 10 pods with 50 background podsE2eNode Suite.[sig-node] LocalStorageCapacityIsolationQuotaMonitoring [Slow] [Serial] [Disruptive] [Feature:LocalStorageCapacityIsolationQuota][NodeFeature:LSCIQuotaMonitoring] when we run containers that should cause use quotas for LSCI monitoring (quotas enabled: false) should eventually evict all of the correct podsE2eNode Suite.[sig-node] Resource-usage [Serial] [Slow] regular resource usage tracking resource tracking for 10 pods per nodeE2eNode Suite.[sig-node] Restart [Serial] [Slow] [Disruptive] Container Runtime Network should recover from ip leak
Since when has it been failing?
11/30, maybe earlier
Testgrid link
https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv2-containerd-node-e2e-serial
Reason for failure (if possible)
No response
Anything else we need to know?
No response
Relevant SIG(s)
/sig node /priority important-soon
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (20 by maintainers)
It’s green now. Yay!
The issue might be that
--container-runtime-process-name=/usr/bin/containerd, but the test installs a custom version of containerd which runs under/home/containerd/usr/local/bin/containerd, sogetPidsForProcessis unable to find it https://github.com/kubernetes/kubernetes/blob/3cec1d1a13a7414ed5413d75898a167220c3892c/test/e2e_node/resource_collector.go#L479which appears to be scanning /proc/{all_pids}/cmdline and matching against regex:
https://github.com/kubernetes/kubernetes/blob/3cec1d1a13a7414ed5413d75898a167220c3892c/pkg/util/procfs/procfs_linux.go#L103-L107
I believe this is due to the test running a cAdvisor pod which does not support cgroupv2. I think https://github.com/kubernetes/kubernetes/pull/106287 aims to address it
@pacoxu do you want to keep working on this? Can I assign the bug to you?