kubernetes: Umbrella issue: slow/timing-out unit tests

running make test KUBE_RACE=-race locally, several packages have very slow unit tests, and some timed-out entirely

The default per-package timeout for make test is 120 seconds (which is already much longer than I would expect).

The following packages had tests that ran longer than 30 seconds on my workstation. Running tests on CI machines regularly takes 2-3x as long. Anything longer than 60 seconds should be prioritized.

cluster-lifecycle

apps

networking

node

storage

api-machinery

/sig cluster-lifecycle apps network node storage api-machinery /triage accepted /help

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 8
  • Comments: 25 (25 by maintainers)

Most upvoted comments

wow… I… didn’t remember that was excluding itself that way. Opened https://github.com/kubernetes/kubernetes/pull/99782 to fix or remove unit tests that don’t work in race mode.

k8s.io/kubernetes/pkg/controller/endpointslice 35s -> 13s #98793

k8s.io/kubernetes/pkg/controller/volume/persistentvolume 41s -> 19s #98792

k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler 120s (timeout) -> 40s #98915

k8s.io/kubernetes/pkg/volume/csi 62s -> 30s #98762 These test cases use global variables, change to running in parallel may panic. I shortened the waiting time. https://github.com/kubernetes/kubernetes/pull/98762#issuecomment-773754653

k8s.io/kubernetes/pkg/volume/util/operationexecutor 56s -> 15s #98760

I’m working on the ‘kubeadm cert’ package. @neolit123

k8s.io/kubernetes/cmd/kubeadm/app/phases/certs 120.433s (timed out) -> #98517

multi-minute-long unit tests are typically a symptom of one of the following:

we should at least look at these packages to see if those are the cause

if we’re down to 1-2 problematic packages, I think I’ll close this in favor of specific targeted issues. If you have pointers to >100 second unit test runs of packages, please open an issue for the package and tag the appropriate sig

UT running timeout in my PR test:

  • //pkg/controller/volume/scheduler PASSED in 109.6s
  • //pkg/controlplane:go_default_test PASSED in 152.0s
  • //staging/src/k8s.io/apiserver/pkg/server/filters:go_default_test TIMEOUT in 300.1s (may be a flake test, need some investigation)

And I find another 3 slow test cases.

k8s.io/kubernetes/pkg/controller/volume/scheduling 109s -> 24s #98912

  • pkiutil: optimized in #98682 by @neolit123 (which is merged)
  • I opened #98691 for cronjob UT. (may be the smallest fix)
  • #98756 for network node-ipam (use the node poll interval 1s in UT and keep 10s in normal mode )