triggers: request.go:668] Waited for *s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/triggers.tekton.dev/v1alpha1/clusterinterceptors/github

Expected Behavior

Event listener is working as expected

Actual Behavior

Drasticall performance degradation 1 request.go:668] Waited for 11.393899312s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/triggers.tekton.dev/v1alpha1/clusterinterceptors/github

Steps to Reproduce the Problem

Install Operator > 0.51.1

Additional Info

Kubernetes version:

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

Tekton Pipeline version:

v0.29.0

As i can see, after tekton-trigger has been moved to knative (v 0.17.x +) it got performance degradation. All webhooks from github are timed out Pipelines run anyway but with start delay ~ 2 minutes. This is time when EventListener is parsing webhook and is trying to match CEL condition. In logs it looks very slow

el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:20.094Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:20.293Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:20.694Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:21.094Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:21.893Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:22.294Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...
el-github-listener-interceptor-6b86d6f674-dp5hh event-listener {"level":"info","ts":"2021-12-09T09:57:22.894Z","logger":"eventlistener","caller":"sink/sink.go:301","msg":"... type push is not allowed","eventlistener":"github-listener-interceptor"...

As you can see it parses 2-3 conditions per second when prior versions parsed hundreds for the same time I just played today and installed different versions of tekton-operator starting 0.50.0 till 0.53.0. All versions with trigger 0.17.x are working very slow

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 2
Comments: 31 (15 by maintainers)

Most upvoted comments

Hello @dibyom I’ve been taking a look into this, and I think the issue is that we are using the DynamicInformers, which do not cache these responses. As a simple example, I’ve patched the latest triggers release, and added the following code here https://github.com/tektoncd/triggers/blob/v0.23.1/cmd/eventlistenersink/main.go#L68

	ictx, informers := injection.Default.SetupInformers(ctx, cfg)
	if informers != nil {
		if err := controller.StartInformers(ctx.Done(), informers...); err != nil {
			panic(fmt.Sprint("Failed to start informers - ", err))
		}
	}
	ctx = ictx

This seems to overwrite the dynamic informers registered in ctx since we call the withInformer factory that writes to the same Key in the context. Now our entries (in our case, mostly clusterinterceptors) seem to be cached now. This has been running for a while in our cluster, and we have had no issues with client-side throttling so far, whereas before we had them on pretty much any event that was received due to the fact that we have a lot of triggers associated with this eventlistener.

Can you confirm if this is the right approach?

Thanks

miguelvalerio on Apr 20, 2023

If v0.24 with pr #1584 fixes this issue, then we can close this.

khrm on May 11, 2023

Triggers Version v0.23.1 still present

Waited for 29.020773428m due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/apis/triggers.tekton.dev/v1beta1/namespaces/tekton/eventlisteners/events-listener

mike-serchenia on Apr 10, 2023

Hi, I was doing some load-testing with my Pipeline in my cluster to see if the cluster can handle my expected load in terms of running PipelinesRuns. Therefore I do 20 curl’s right away (to get a “step-response”) to a single EventListener, which is responsible for at least 23 Triggers. The first 2-4 PipelineRuns are created directly and the remaining ones are popping up after each other with around 2-3 minutes intervals. After ~20 minutes all PipelineRuns finally run.

Worth to mention: While doing the load-test the cluster is already running a significant amount of PipelineRuns.

When digging into this problem I saw that curl return exit-code 22 which refers to getting >400 back from the EventListener. After looking into the EventListener logs I found the same errors as above mentioned. Currently I can not get the exact error, but can provide if helpful. It’s something like this:

Waited for XXXs due to client-side throttling, not priority and fairness, request: GET:<SOME-URL>

seternate on Mar 3, 2023

@joaosilva15 yes it seems like there are some spots where we are calling the API server directly instead of going through the lister cache.

dibyom on Jul 7, 2022

This is also happening to me. Downgraded from 0.18 to 0.16 and the problem seemed to disappear. Although in my case not all github requests were failing, only some. Triggers: 0.18.0 Pipelines 0.32.1

One warning I get from the listeners pods is Sink timeout configuration is invalid, default to -1 (no timeout) not sure if this might be related to the issue.

I was thinking that if the eventlistener calls the API server for each trigger on each request to fetch the interceptor it might cause this kind of issue? Would this be a problem?

joaosilva15 on Feb 8, 2022