kubernetes: QPS limits in clients seem to not work in all situations
In kubemark-500, we are bumping QPS limits for clients in controller-amanger to 100. https://github.com/kubernetes/test-infra/blob/master/jobs/ci-kubernetes-kubemark-500-gce.sh#L42
However, it seems the ReplicationController is creating pods significantly faster than 100 per second. From the logs from: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-kubemark-500-gce/642?log#log
I1207 00:19:24.543] I1207 00:19:24.543499 9233 runners.go:120] Created replication controller with name: density15000-0-dcd07081-bc55-11e6-9e64-0242ac110005, namespace: e2e-tests-density-100-1-c20xv, replica count: 0xc42e2ca34c
I1207 00:19:24.549] I1207 00:19:24.549245 9233 runners.go:120] Created replication controller with name: density15000-3-dcd07081-bc55-11e6-9e64-0242ac110005, namespace: e2e-tests-density-100-4-brbqf, replica count: 0xc42c885ca4
I1207 00:19:24.549] I1207 00:19:24.549350 9233 runners.go:120] Created replication controller with name: density15000-4-dcd07081-bc55-11e6-9e64-0242ac110005, namespace: e2e-tests-density-100-5-ph9b9, replica count: 0xc42d7b42ac
I1207 00:19:24.549] I1207 00:19:24.549584 9233 runners.go:120] Created replication controller with name: density15000-2-dcd07081-bc55-11e6-9e64-0242ac110005, namespace: e2e-tests-density-100-3-tvqj3, replica count: 0xc42e86dc7c
I1207 00:19:24.571] I1207 00:19:24.570795 9233 runners.go:120] Created replication controller with name: density15000-1-dcd07081-bc55-11e6-9e64-0242ac110005, namespace: e2e-tests-density-100-2-qm9px, replica count: 0xc42dc7fe7c
I1207 00:19:34.527] Dec 7 00:19:34.526: INFO: Density Pods: 1178 out of 15000 created, 883 running, 141 pending, 154 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:19:44.529] Dec 7 00:19:44.528: INFO: Density Pods: 2182 out of 15000 created, 1880 running, 137 pending, 165 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:19:54.530] Dec 7 00:19:54.530: INFO: Density Pods: 3489 out of 15000 created, 2876 running, 136 pending, 477 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:20:04.532] Dec 7 00:20:04.532: INFO: Density Pods: 5857 out of 15000 created, 3878 running, 131 pending, 1848 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:20:14.534] Dec 7 00:20:14.534: INFO: Density Pods: 8718 out of 15000 created, 4872 running, 135 pending, 3711 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:20:24.537] Dec 7 00:20:24.536: INFO: Density Pods: 11044 out of 15000 created, 5876 running, 131 pending, 5037 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:20:34.538] Dec 7 00:20:34.538: INFO: Density Pods: 12943 out of 15000 created, 6865 running, 141 pending, 5937 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:20:44.543] Dec 7 00:20:44.542: INFO: Density Pods: 14328 out of 15000 created, 7878 running, 126 pending, 6324 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
I1207 00:20:54.541] Dec 7 00:20:54.540: INFO: Density Pods: 14998 out of 15000 created, 8883 running, 121 pending, 5994 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotR
This pretty much means that we created 15000 pods within 90s which gives more than 100 pods/s on average.
Seems like a bug.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 17 (17 by maintainers)
Commits related to this issue
- Merge pull request #38294 from liggitt/rate-limit Automatic merge from submit-queue (batch tested with PRs 38294, 37009, 36778, 38130, 37835) Re-use tested ratelimiter The ratelimiter introduced in... — committed to liggitt/kubernetes by deleted user 8 years ago
- Merge pull request #38306 from liggitt/rate-limit-test Automatic merge from submit-queue (batch tested with PRs 35939, 38381, 37825, 38306, 38110) Add test for multi-threaded use of ratelimiter Add... — committed to kubernetes/kubernetes by deleted user 8 years ago
Thanks a lot for noticing this, @wojtek-t
agree it’s a blocker. is there a reason not to return to the tested ratelimiter that was in place prior to that PR?