kubernetes: [sig-node] Summary API [NodeConformance] when querying /stats/summary ... networking info is nil on containerd
Failure cluster 79384df20f4e672ed9e1
Test: [sig-node] Summary API [NodeConformance] when querying /stats/summary should report resource usage through the stats api Job:
- ci-cgroup-systemd-containerd-node-e2e
- ci-cos-containerd-node-e2e
- pull-kubernetes-node-e2e-containerd
Error text:
test/e2e_node/summary_test.go:53
Timed out after 180.001s.
Expected
<string>: Summary
to match fields: {
.Pods[summary-test-3122::stats-busybox-1].Network:
Expected
<string>: NetworkStats
to match fields: {
[.InterfaceStats.Name:
Expected
<string>:
to equal
<string>: eth0, .InterfaceStats.RxBytes:
Expected
<*uint64 | 0x0>: nil
not to be <nil>, .InterfaceStats.RxErrors:
Expected
<*uint64 | 0x0>: nil
not to be <nil>, .InterfaceStats.TxBytes:
Expected
<*uint64 | 0x0>: nil
not to be <nil>, .InterfaceStats.TxErrors:
Expected
<*uint64 | 0x0>: nil
not to be <nil>]
.Interfaces:
Expected
<[]v1alpha1.InterfaceStats | len:0, cap:0>: nil
not to be nil
}
}
test/e2e_node/summary_test.go:327
Recent failures:
3/27/2022, 9:20:16 AM ci-cgroup-systemd-containerd-node-e2e 3/27/2022, 3:20:08 AM ci-cos-containerd-node-e2e 3/26/2022, 9:20:08 PM ci-cos-containerd-node-e2e 3/26/2022, 4:06:12 AM ci-cos-containerd-node-e2e 3/25/2022, 11:58:17 AM ci-cos-containerd-node-e2e
Started flaking 03/23.
(second screenshot includes PRs)
/kind flake
/sig node
/priority important-soon /milestone v1.24
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 66 (66 by maintainers)
https://github.com/google/cadvisor/pull/3103 is ready now. the CI jobs for cadvisor are š¢ now as well.
@bobbypage can you please merge it, release v0.45 and open a PR for k/k to update the version of k8s.
thanks, Dims
I donāt see the metrics grabber related to this, there is a bug on the getNodeSummary though, it doesnāt work for ipv6 addresses
Also, the
are not providing any information, I think we should use something different to compare to avoid artifacts failing the comparison.
Iāll submit a patch
btw, now we are also trying to repro bug with logging: https://github.com/kubernetes/kubernetes/pull/109472 so far no luck. @mmiranda96 and @bobbypage tried locally, never succeeded to repro. @ruiwen-zhao had some luck with repro before: https://github.com/kubernetes/kubernetes/pull/109371 but it seems not any longer.
@liggitt this test is generally very flaky in part because it tests so many (too many?) thingsā¦
https://github.com/kubernetes/kubernetes/issues/108836 had issues with CPUStats https://github.com/kubernetes/kubernetes/issues/104292 was swap-only?
This failure is networkstats timing out specifically.
reproduced with a bit more logging in https://github.com/kubernetes/kubernetes/pull/109371#issuecomment-1104422428
TLDR, my change below caused a āinvalid memory address or nil pointer dereferenceā intermittently:
So I guess
p.Network
is an empty pointer when the test fails.@mikebrow we may try the approach like #101960 to fix this.
Also checking time of the test run in the latest 5 runs, and the earliest 5 runs, I donāt see a huge difference
So the passing tests are pretty stable and take only ~ 1min, whereas failed runs take more than 3 mins. This does not look like a performance regression.
@helayoty this is one of three issues SIG Node is tracking as a possible regression in 1.24, we need a clear answer before it can be removed from the milestone.
š for explanation; š for unclear test signal š