kubernetes: flaky pull-kubernetes-e2e-kind tests due to timeout

Which jobs are flaking:

pull-kubernetes-e2e-kind

Which test(s) are flaking:

Different tests seem to randomly fail. Some examples:

[sig-network] KubeProxy should set TCP CLOSE_WAIT timeout [Privileged] : timed out waiting for the condition
[sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]: Failed to connect to exposed host ports
[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with scheduling without taints: timed out waiting for the condition
[sig-storage] CSI mock volume CSI Volume expansion should expand volume by restarting pod if attach=on, nodeExpansion=on: timed out waiting for the condition
…

Reason for failure

Timeouts

Anything else we need to know:

/kind flake /sig network /sig node /sig storage /cc @dims

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 24 (24 by maintainers)

Most upvoted comments

Opened https://github.com/kubernetes/kubernetes/issues/101913

msau42 on May 11, 2021

Do we have an issue open about re-enabling the hostpath tests in these kind jobs (i.e. about reverting https://github.com/kubernetes/test-infra/pull/22025)?

pohly on May 11, 2021

For the record: after @aojea reverted the most recent startup probe test change we are now at a normal ~80% pass rate which consists of a few number of startup probe flakes and normal failures due to bad PRs (code doesn’t build, or debug code tanks performance and either wall all the e2e jobs fail not just kind).

Linking the sidecars together makes sense to me. We probably still need to get actual before & after measurements. Prow will be a little noisy though.

BenTheElder on May 3, 2021

Success rate over time: 3h: 75%, 12h: 56%, 48h: 44%

aojea on Apr 30, 2021