kubernetes: kube-scheduler's performance looks bad
What happened: I have a 5000 node cluster, and 55000+ pods have already been running in this cluster. 25000 of them are managed by 1000 deployments. Then I did some test to know about the scheduler performance, I found that the qps was always 15+ pods/sec and scheduler only consume, at most, 2 of 32 cores. The qps will not increase when I do rolling update against more deployments, instead, there are lots of pods are pending. Also, I observed there are lots of trace log while doing rolling update.
I0509 09:33:38.051328 1 trace.go:76] Trace[1260452793]: "Scheduling test-namespace/test-deploy-7d6f9d6f97-294mh" (started: 2019-05-09 09:33:37.921340778 +0000 UTC m=+100851.375269575) (total time: 129.959068ms):
Trace[1260452793]: [2.552365ms] [2.552365ms] Computing predicates
Trace[1260452793]: [46.367012ms] [43.814647ms] Prioritizing
Trace[1260452793]: [129.932795ms] [83.565783ms] Selecting host
Trace[1260452793]: [129.959068ms] [26.273µs] END
What you expected to happen:
- scheduler should consume more cpus.
- qps should be a large value. (I don’t know, may be 100 pods/sec)
How to reproduce it (as minimally and precisely as possible): Doing rolling update concurrently.
Anything else we need to know?: All pod have 3 labels. Environment:
- Kubernetes version (use
kubectl version
): v1.13.4 - Cloud provider or hardware configuration: 32 core & 128 gb memory & 5000 node cluster
- OS (e.g:
cat /etc/os-release
): v3.3.11 - Kernel (e.g.
uname -a
): 4.18.8-1.el7.elrepo.x86_64 - Install tools: kubeadm & ansible
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 34 (22 by maintainers)
https://github.com/kubernetes/utils/pull/93