triggers: request.go:668] Waited for *s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/triggers.tekton.dev/v1alpha1/clusterinterceptors/github
Expected Behavior
Event listener is working as expected
Actual Behavior
Drasticall performance degradation
1 request.go:668] Waited for 11.393899312s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/triggers.tekton.dev/v1alpha1/clusterinterceptors/github
Steps to Reproduce the Problem
- Install Operator > 0.51.1
Additional Info
-
Kubernetes version:
Output of
kubectl version:
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
-
Tekton Pipeline version:
v0.29.0
As i can see, after tekton-trigger has been moved to knative (v 0.17.x +) it got performance degradation. All webhooks from github are timed out
Pipelines run anyway but with start delay ~ 2 minutes. This is time when EventListener is parsing webhook and is trying to match CEL condition. In logs it looks very slow
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:20.094Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:20.293Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:20.694Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:21.094Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:21.893Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:22.294Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:22.894Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
As you can see it parses 2-3 conditions per second when prior versions parsed hundreds for the same time I just played today and installed different versions of tekton-operator starting 0.50.0 till 0.53.0. All versions with trigger 0.17.x are working very slow
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 31 (15 by maintainers)
Hello @dibyom I’ve been taking a look into this, and I think the issue is that we are using the
DynamicInformers, which do not cache these responses. As a simple example, I’ve patched the latest triggers release, and added the following code here https://github.com/tektoncd/triggers/blob/v0.23.1/cmd/eventlistenersink/main.go#L68This seems to overwrite the dynamic informers registered in
ctxsince we call thewithInformerfactory that writes to the sameKeyin thecontext. Now our entries (in our case, mostlyclusterinterceptors) seem to be cached now. This has been running for a while in our cluster, and we have had no issues withclient-side throttlingso far, whereas before we had them on pretty much any event that was received due to the fact that we have a lot oftriggersassociated with thiseventlistener.Can you confirm if this is the right approach?
Thanks
If v0.24 with pr #1584 fixes this issue, then we can close this.
Triggers Version v0.23.1 still present
Waited for 29.020773428m due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/apis/triggers.tekton.dev/v1beta1/namespaces/tekton/eventlisteners/events-listener
Hi, I was doing some load-testing with my Pipeline in my cluster to see if the cluster can handle my expected load in terms of running PipelinesRuns. Therefore I do 20 curl’s right away (to get a “step-response”) to a single EventListener, which is responsible for at least 23 Triggers. The first 2-4 PipelineRuns are created directly and the remaining ones are popping up after each other with around 2-3 minutes intervals. After ~20 minutes all PipelineRuns finally run.
Worth to mention: While doing the load-test the cluster is already running a significant amount of PipelineRuns.
When digging into this problem I saw that curl return exit-code 22 which refers to getting >400 back from the EventListener. After looking into the EventListener logs I found the same errors as above mentioned. Currently I can not get the exact error, but can provide if helpful. It’s something like this:
Waited for XXXs due to client-side throttling, not priority and fairness, request: GET:<SOME-URL>@joaosilva15 yes it seems like there are some spots where we are calling the API server directly instead of going through the lister cache.
This is also happening to me. Downgraded from 0.18 to 0.16 and the problem seemed to disappear. Although in my case not all github requests were failing, only some. Triggers: 0.18.0 Pipelines 0.32.1
One warning I get from the listeners pods is
Sink timeout configuration is invalid, default to -1 (no timeout)not sure if this might be related to the issue.I was thinking that if the eventlistener calls the API server for each trigger on each request to fetch the interceptor it might cause this kind of issue? Would this be a problem?