kubernetes: [flaky] SchedulingThroughput - SchedulingThroughput error: scheduler throughput: actual throughput 81.200000 lower than threshold 90.000000]

Which jobs are flaking:

  • ci-kubernetes-e2e-gci-gce-scalability
  • ci-kubernetes-e2e-gci-gce-scalability-networkpolicies
  • pull-kubernetes-e2e-gce-100-performance

Which test(s) are flaking:

testing/density/config.yaml

SchedulingThroughput error: scheduler throughput: actual throughput 81.800000 lower than threshold 90.000000]

Testgrid link:

Anything else we need to know:

Looking just at periodic jobs (not presubmit), the number of failures has been steadily rising since May:

/sig scalability /sig scheduling

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 35 (35 by maintainers)

Most upvoted comments

Given the throughput is computed by the test, it’s also possible that the cpu-starved test would result in lower numbers than they actually are.

Speaking of that, I just noticed today that the job doesn’t request CPU for the test pod, so it can easily be starved. We noticed it was getting assigned the default request of 250m in https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/93307/pull-kubernetes-e2e-gce-100-performance/1285712793332355072/ (see podinfo.json in the artifacts).

Shouldn’t the perf test be requesting a specific amount of CPU for its test pod (and maybe even limiting to that amount to make results over time comparable)?

Do we know if there are any performance regressions on the api server side related to updates?

Not that I’m aware of. If I’m reading http://perf-dash.k8s.io/#/?jobname=gce-100Nodes-master&metriccategoryname=APIServer&metricname=DensityResponsiveness_PrometheusSimple&Resource=pods&Scope=namespace&Subresource=binding&Verb=POST correctly, the performance of posts to pods/binding looks pretty stable over time