kubernetes: Tests on sig-node-containerd#cos-cgroupv2-containerd-node-e2e-serial are failing

Which jobs are failing?

sig-node-containerd#cos-cgroupv2-containerd-node-e2e-serial

Which tests are failing?

E2eNode Suite.[sig-node] Density [Serial] [Slow] create a batch of pods latency/resource should be within limit when create 10 pods with 0s intervalE2eNode Suite.[sig-node] Density [Serial] [Slow] create a sequence of pods latency/resource should be within limit when create 10 pods with 50 background podsE2eNode Suite.[sig-node] LocalStorageCapacityIsolationQuotaMonitoring [Slow] [Serial] [Disruptive] [Feature:LocalStorageCapacityIsolationQuota][NodeFeature:LSCIQuotaMonitoring] when we run containers that should cause use quotas for LSCI monitoring (quotas enabled: false) should eventually evict all of the correct podsE2eNode Suite.[sig-node] Resource-usage [Serial] [Slow] regular resource usage tracking resource tracking for 10 pods per nodeE2eNode Suite.[sig-node] Restart [Serial] [Slow] [Disruptive] Container Runtime Network should recover from ip leak

Since when has it been failing?

11/30, maybe earlier

Testgrid link

https://testgrid.k8s.io/sig-node-containerd#cos-cgroupv2-containerd-node-e2e-serial

Reason for failure (if possible)

No response

Anything else we need to know?

No response

Relevant SIG(s)

/sig node /priority important-soon

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 20 (20 by maintainers)

Most upvoted comments

It’s green now. Yay!

SergeyKanzhelev on Jan 26, 2022

The issue might be that --container-runtime-process-name=/usr/bin/containerd, but the test installs a custom version of containerd which runs under /home/containerd/usr/local/bin/containerd, so getPidsForProcess is unable to find it https://github.com/kubernetes/kubernetes/blob/3cec1d1a13a7414ed5413d75898a167220c3892c/test/e2e_node/resource_collector.go#L479

which appears to be scanning /proc/{all_pids}/cmdline and matching against regex:

https://github.com/kubernetes/kubernetes/blob/3cec1d1a13a7414ed5413d75898a167220c3892c/pkg/util/procfs/procfs_linux.go#L103-L107

bobbypage on Jan 10, 2022

I believe this is due to the test running a cAdvisor pod which does not support cgroupv2. I think https://github.com/kubernetes/kubernetes/pull/106287 aims to address it

bobbypage on Jan 5, 2022

@pacoxu do you want to keep working on this? Can I assign the bug to you?

SergeyKanzhelev on Jan 5, 2022