kubernetes-ingress-controller: Kong Ingress Controller cannot scale beyond a limit

Current Behavior

Scenario:

Create 1500 Secrets in a namespace with approximately 1MB data in each of the Secret.

This was done to reproduce an issue that we faced in production where we had thousands of Secrets, with a cumulative data size of 1.8 GB. This test we do simulates this real-world scenario with ~1.5GB in the cluster.

Expected Behavior

Kong restarts should work fine, both the proxy and the ingress controller should come up and stay in Running state.

Steps To Reproduce

Create Secrets totalling ~1.5 GB of data, preferably in the same namespace where Kong is running. (I have a go module that does this, let me know if that helps and I will try to make it available)
Restart Kong.
Observe that Kong goes into an infinite CrashLoopBackoff

Kong Ingress Controller version

2.1.1, but should exist in main too. Going through the code makes me think that this is embedded deep inside the sig.k8s.io module which doesn’t handle pagination effectively. Kong is running in DB-less mode.

Kubernetes version

$ kubectl --context test version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.15", GitCommit:"58178e7f7aab455bc8de88d3bdd314b64141e7ee", GitTreeState:"clean", BuildDate:"2021-09-15T19:23:02Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.15-eks-9c63c4", GitCommit:"9c63c4037a56f9cad887ee76d55142abd4155179", GitTreeState:"clean", BuildDate:"2021-10-20T00:21:03Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

Anything else?

Discussion in https://discuss.konghq.com/t/kong-ingress-crontroller-in-dbless-mode-error-failed-to-wait-for-secret-caches-to-sync-timed-out-waiting-for-cache-to-be-synced/9923.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 32 (18 by maintainers)

Most upvoted comments

Thanks for the pointer @prateekgogia (from the AWS EKS team) !

I can comfirm disabling compression after this issue is encountered helps fix this issue. This is my changeset:

--- a/internal/manager/run.go
+++ b/internal/manager/run.go
@@ -55,6 +55,8 @@ func Run(ctx context.Context, c *Config, diagnostic util.ConfigDumpDiagnostic) e
 
        setupLog.Info("getting the kubernetes client configuration")
        kubeconfig, err := c.GetKubeconfig()
+       // krish: disable compression here
+       kubeconfig.DisableCompression = true
        if err != nil {
                return fmt.Errorf("get kubeconfig from file %q: %w", c.KubeconfigPath, err)
        }
@@ -84,7 +86,7 @@ func Run(ctx context.Context, c *Config, diagnostic util.ConfigDumpDiagnostic) e
        if err != nil {
                return fmt.Errorf("unable to setup controller options: %w", err)
        }

This also touches on the scale aspect here, as:

if we had more CPU to compress this faster, then it would be better.
if we can optimise the compression ~~routing~~ routine in the API server, this would be better.

We reproduced this in our live AWS cluster (please note that this wasn’t done in Kind/locally). KIC started crashing. We then updated KIC image to one built with the above changeset and the pods stabilised. We had close to ~2.1GB Secrets data at this point in time.

EDIT:

At this point, the least KIC can do is to provide a flag to disable compression of responses from the api-server so that such scale issues can be successfully mitigated by customers who hit them.

krish7919 on Jul 8, 2022

Here’s a summary of my analysis so far. As mentioned before, I had a hunch that this is a pagination issue and adding pagination on the client side will fix the issue. This doesn’t seem to be the case.

The workflow that Kubernetes, from my understanding, is as follows:

The client sends a request for the secrets like so GET /api/v1/secrets?limit=500&resourceVersion=0.
However, for pagination to work, the apiserver reads all the secrets from etcd and then stores them in memory.
This full dump of Secret data is then paginated to the client.

What is actually happening here:

The client (here the Kong Ingress Controller) sends a request for the secrets like so GET /api/v1/secrets?limit=500&resourceVersion=0.
The apiserver cannot read the Secrets from etcd due to the scale issue. Cluster has ~1.5GB of Secret data. And KIC needs to read this at startup to keep its local cache store in sync with data in etcd.
Even though the client can specify pagination (with say a limit of 20), it is the apiserver which times out as it cannot read the data from etcd in the default timeout period; 60s by default in our production Kubernetes cluster (using AWS EKS), and 5s in my local tests using Kind.
This causes the KIC to go in a crash loop and never recover.

The stack trace from kube-apiserver is as follows:

2022-03-08 17:37:27.000,"E0308 17:37:27.074039      10 runtime.go:78] Observed a panic: &errors.errorString{s:""killing connection/stream because serving request timed out and response had been started""} (killing connection/stream because serving request timed out and response had been started)
goroutine 872445776 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x3f96260, 0xc000698a10)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa6
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc012ef9c98, 0x1, 0x1)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
panic(0x3f96260, 0xc000698a10)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc0b8155120, 0xc01fc2fe00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:257 +0x1bc
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc01c7910a0, 0x5171da0, 0xc167a1e000, 0xc09bd44300)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:141 +0x2f3
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x5171da0, 0xc167a1e000, 0xc09bd44200)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/waitgroup.go:59 +0x137
net/http.HandlerFunc.ServeHTTP(0xc01c7955f0, 0x5171da0, 0xc167a1e000, 0xc09bd44200)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x5171da0, 0xc167a1e000, 0xc09bd44100)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x269
net/http.HandlerFunc.ServeHTTP(0xc01c795620, 0x5171da0, 0xc167a1e000, 0xc09bd44100)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuditAnnotations.func1(0x5171da0, 0xc167a1e000, 0xc09bd44000)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit_annotations.go:37 +0x142
net/http.HandlerFunc.ServeHTTP(0xc01c7910c0, 0x5171da0, 0xc167a1e000, 0xc09bd44000)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithWarningRecorder.func1(0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/warning.go:35 +0x1a7
net/http.HandlerFunc.ServeHTTP(0xc01c7910e0, 0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1(0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/cachecontrol.go:31 +0xa8
net/http.HandlerFunc.ServeHTTP(0xc01c791100, 0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:91 +0x2f2
net/http.HandlerFunc.ServeHTTP(0xc01c791120, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:51 +0xe6
net/http.HandlerFunc.ServeHTTP(0xc01c791140, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc01c795650, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:189 +0x51
net/http.serverHandler.ServeHTTP(0xc0087190a0, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:2843 +0xa3
net/http.initALPNRequest.ServeHTTP(0x5177620, 0xc05cc33500, 0xc0f0096380, 0xc0087190a0, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:3415 +0x8d
k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).runHandler(0xc0a0e46c00, 0xc027936fd0, 0xc03bb7de00, 0xc0b8154fa0)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2152 +0x8b
created by k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).processHeaders
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:1882 +0x505"
2022-03-08 17:37:27.000,"E0308 17:37:27.074113      10 wrap.go:39] apiserver panic'd on GET /api/v1/secrets?limit=500&resourceVersion=0
http2: panic serving 10.11.12.150:47184: killing connection/stream because serving request timed out and response had been started
goroutine 872445776 [running]:
k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).runHandler.func1(0xc027936fd0, 0xc012ef9f8e, 0xc0a0e46c00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2145 +0x16f
panic(0x3f96260, 0xc000698a10)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc012ef9c98, 0x1, 0x1)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c
panic(0x3f96260, 0xc000698a10)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).timeout(0xc0b8155120, 0xc01fc2fe00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:257 +0x1bc
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc01c7910a0, 0x5171da0, 0xc167a1e000, 0xc09bd44300)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:141 +0x2f3
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1(0x5171da0, 0xc167a1e000, 0xc09bd44200)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/waitgroup.go:59 +0x137
net/http.HandlerFunc.ServeHTTP(0xc01c7955f0, 0x5171da0, 0xc167a1e000, 0xc09bd44200)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1(0x5171da0, 0xc167a1e000, 0xc09bd44100)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:39 +0x269
net/http.HandlerFunc.ServeHTTP(0xc01c795620, 0x5171da0, 0xc167a1e000, 0xc09bd44100)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuditAnnotations.func1(0x5171da0, 0xc167a1e000, 0xc09bd44000)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/audit_annotations.go:37 +0x142
net/http.HandlerFunc.ServeHTTP(0xc01c7910c0, 0x5171da0, 0xc167a1e000, 0xc09bd44000)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithWarningRecorder.func1(0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/warning.go:35 +0x1a7
net/http.HandlerFunc.ServeHTTP(0xc01c7910e0, 0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1(0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/cachecontrol.go:31 +0xa8
net/http.HandlerFunc.ServeHTTP(0xc01c791100, 0x5171da0, 0xc167a1e000, 0xc03bb7df00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.WithLogging.func1(0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:91 +0x2f2
net/http.HandlerFunc.ServeHTTP(0xc01c791120, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1(0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:51 +0xe6
net/http.HandlerFunc.ServeHTTP(0xc01c791140, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:2042 +0x44
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc01c795650, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:189 +0x51
net/http.serverHandler.ServeHTTP(0xc0087190a0, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:2843 +0xa3
net/http.initALPNRequest.ServeHTTP(0x5177620, 0xc05cc33500, 0xc0f0096380, 0xc0087190a0, 0x5164460, 0xc027936fd0, 0xc03bb7de00)
	/usr/local/go/src/net/http/server.go:3415 +0x8d
k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).runHandler(0xc0a0e46c00, 0xc027936fd0, 0xc03bb7de00, 0xc0b8154fa0)
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:2152 +0x8b
created by k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*serverConn).processHeaders
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/golang.org/x/net/http2/server.go:1882 +0x505"
2022-03-08 17:37:27.000,E0308 17:37:27.074450      10 writers.go:107] apiserver was unable to write a JSON response: http: Handler timeout
2022-03-08 17:37:27.000,"E0308 17:37:27.074469      10 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:""http: Handler timeout""}"
2022-03-08 17:37:27.000,E0308 17:37:27.075645      10 writers.go:120] apiserver was unable to write a fallback JSON response: http: Handler timeout
2022-03-08 17:37:27.000,"I0308 17:37:27.076749      10 trace.go:205] Trace[1088740983]: ""List"" url:/api/v1/secrets,user-agent:manager/v0.0.0 (linux/amd64) kubernetes/$Format,client:10.11.12.150 (08-Mar-2022 17:36:27.074) (total time: 60002ms):
Trace[1088740983]: ---""Writing http response done"" count:9384 59993ms (17:37:00.076)
Trace[1088740983]: [1m0.002636616s] [1m0.002636616s] END"

I also looked at the pagination proposal for Kubernetes clients and apiserver here and the documentation here.

It specifically states that:

Some clients such as controllers, receiving a 410 error, may instead wish to perform a full LIST without chunking.

    Controllers with full caches
        Any controller with a full in-memory cache of one or more resources almost certainly depends on having a consistent view of resources, and so will either need to perform a full list or a paged list, without dropping results

But no matter whether we page or not at the client side, this will cause issues reading from etcd. The KEP also provide an Alternatives section at the end stating:

Alternatives

Compression from the apiserver and between the apiserver and etcd can reduce total network bandwidth, but cannot reduce the peak CPU and memory used inside the client, apiserver, or etcd processes.
...

So is this a kube-apiserver CPU and/or memory issue we are hitting here?

I am going to try to raise the limits in our various EKS clusters and perform a scale test again to see if it improves the situation in any way.

krish7919 on Jun 17, 2022

Perhaps we should try and create a reproduction environment: an example controller from scratch with kubebuilder or operator-sdk which would also perform this initial listing, create a kind cluster, deploy a huge number of objects to the cluster, and then deploy our example operator to the cluster to reproduce this issue outside of Kong context.

I am already stealing time at work and on my way to:

Create a KinD cluster with custom configs: set apiserver request timeout to 10s
Deploy our custom services + Kong on them
Have a tool that can create multiple Secrets in parallel to trigger this issue

As mentioned before, I can open source the tool to create Secrets in parallel, but (as of now) cannot open source the KinD setup scripts as they are too detailed and try to replicate our cloud deployments to a large extent. But I can strip this down later if needed, and open source a subset of work. Wrt not testing with Kong, I can create a new operator, but would try to avoid the work and just try to repro things with Kong and try to see if we can do something.

krish7919 on May 27, 2022

Might be related: https://github.com/helm/helm/pull/10715, which we hit in the same cluster with huge amounts of Secrets, and that was solved by using proper pagination.

krish7919 on Mar 31, 2022

Logs are as in the discussion link:

time="2022-03-08T17:23:57Z" level=debug msg="no configuration change, skipping sync to kong" subsystem=proxy-cache-resolver
I0308 17:23:57.491252       1 trace.go:205] Trace[1752425436]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.0-alpha.1/tools/cache/reflector.go:167 (08-Mar-2022 17:22:57.489) (total time: 60001ms):
Trace[1752425436]: [1m0.001703338s] [1m0.001703338s] END
E0308 17:23:57.491272       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0-alpha.1/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 197; INTERNAL_ERROR; received from peer
time="2022-03-08T17:24:00Z" level=debug msg="no configuration change, skipping sync to kong" subsystem=proxy-cache-resolver
...
...
...
time="2022-03-08T17:24:57Z" level=debug msg="no configuration change, skipping sync to kong" subsystem=proxy-cache-resolver
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for secret caches to sync: timed out waiting for cache to be synced" logger=controller.secret reconciler group= reconciler kind=Secret
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for kongplugin caches to sync: timed out waiting for cache to be synced" logger=controller.kongplugin reconciler group=configuration.konghq.com reconciler kind=KongPlugin
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for kongconsumer caches to sync: timed out waiting for cache to be synced" logger=controller.kongconsumer reconciler group=configuration.konghq.com reconciler kind=KongConsumer
time="2022-03-08T17:24:57Z" level=info msg="context done: shutting down the proxy update server" subsystem=proxy-cache-resolver
time="2022-03-08T17:24:57Z" level=error msg="context completed with error" error="context canceled" subsystem=proxy-cache-resolver
I0308 17:24:57.489210       1 trace.go:205] Trace[1371614229]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.23.0-alpha.1/tools/cache/reflector.go:167 (08-Mar-2022 17:23:58.818) (total time: 58670ms):
Trace[1371614229]: [58.670670788s] [58.670670788s] END
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for ingress caches to sync: timed out waiting for cache to be synced" logger=controller.ingress reconciler group=networking.k8s.io reconciler kind=Ingress
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for service caches to sync: timed out waiting for cache to be synced" logger=controller.service reconciler group= reconciler kind=Service
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for tcpingress caches to sync: timed out waiting for cache to be synced" logger=controller.tcpingress reconciler group=configuration.konghq.com reconciler kind=TCPIngress
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for endpoints caches to sync: timed out waiting for cache to be synced" logger=controller.endpoints reconciler group= reconciler kind=Endpoints
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for kongclusterplugin caches to sync: timed out waiting for cache to be synced" logger=controller.kongclusterplugin reconciler group=configuration.konghq.com reconciler kind=KongClusterPlugin
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for kongconsumer caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for udpingress caches to sync: timed out waiting for cache to be synced" logger=controller.udpingress reconciler group=configuration.konghq.com reconciler kind=UDPIngress
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for ingress caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="Could not wait for Cache to sync" error="failed to wait for kongingress caches to sync: timed out waiting for cache to be synced" logger=controller.kongingress reconciler group=configuration.konghq.com reconciler kind=KongIngress
time="2022-03-08T17:24:57Z" level=info msg="stop status update channel."
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for service caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for tcpingress caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for endpoints caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for kongclusterplugin caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for udpingress caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for kongingress caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for kongplugin caches to sync: timed out waiting for cache to be synced"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="context canceled"
time="2022-03-08T17:24:57Z" level=error msg="error received after stop sequence was engaged" error="context canceled"
Error: failed to wait for secret caches to sync: timed out waiting for cache to be synced
Error: failed to wait for secret caches to sync: timed out waiting for cache to be synced

It is not an OOM, although if I keep adding Secrets, it sometimes goes to OOM.

I have already evaluated --watch-namespaces option and it is unfortunately not an option for us as we need to watch all namespaces in the cluster.

krish7919 on Mar 31, 2022