kubernetes-ingress-controller: Kong Ingress Controller cannot scale beyond a limit
Current Behavior
Scenario:
Create 1500 Secrets in a namespace with approximately 1MB data in each of the Secret.
This was done to reproduce an issue that we faced in production where we had thousands of Secrets, with a cumulative data size of 1.8 GB. This test we do simulates this real-world scenario with ~1.5GB in the cluster.
Expected Behavior
Kong restarts should work fine, both the proxy and the ingress controller should come up and stay in Running
state.
Steps To Reproduce
- Create Secrets totalling ~1.5 GB of data, preferably in the same namespace where Kong is running. (I have a go module that does this, let me know if that helps and I will try to make it available)
- Restart Kong.
- Observe that Kong goes into an infinite CrashLoopBackoff
Kong Ingress Controller version
2.1.1, but should exist in main
too. Going through the code makes me think that this is embedded deep inside the sig.k8s.io
module which doesn’t handle pagination effectively. Kong is running in DB-less mode.
Kubernetes version
$ kubectl --context test version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.15", GitCommit:"58178e7f7aab455bc8de88d3bdd314b64141e7ee", GitTreeState:"clean", BuildDate:"2021-09-15T19:23:02Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.15-eks-9c63c4", GitCommit:"9c63c4037a56f9cad887ee76d55142abd4155179", GitTreeState:"clean", BuildDate:"2021-10-20T00:21:03Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Anything else?
Might be related to https://github.com/kubernetes/kubernetes/issues/108003.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 32 (18 by maintainers)
Thanks for the pointer @prateekgogia (from the AWS EKS team) !
I can comfirm disabling compression after this issue is encountered helps fix this issue. This is my changeset:
This also touches on the scale aspect here, as:
routingroutine in the API server, this would be better.We reproduced this in our live AWS cluster (please note that this wasn’t done in Kind/locally). KIC started crashing. We then updated KIC image to one built with the above changeset and the pods stabilised. We had close to ~2.1GB Secrets data at this point in time.
EDIT:
At this point, the least KIC can do is to provide a flag to disable compression of responses from the api-server so that such scale issues can be successfully mitigated by customers who hit them.
Here’s a summary of my analysis so far. As mentioned before, I had a hunch that this is a pagination issue and adding pagination on the client side will fix the issue. This doesn’t seem to be the case.
The workflow that Kubernetes, from my understanding, is as follows:
GET /api/v1/secrets?limit=500&resourceVersion=0
.etcd
and then stores them in memory.What is actually happening here:
GET /api/v1/secrets?limit=500&resourceVersion=0
.etcd
in the default timeout period; 60s by default in our production Kubernetes cluster (using AWS EKS), and 5s in my local tests using Kind.The stack trace from kube-apiserver is as follows:
I also looked at the pagination proposal for Kubernetes clients and apiserver here and the documentation here.
It specifically states that:
But no matter whether we page or not at the client side, this will cause issues reading from etcd. The KEP also provide an
Alternatives
section at the end stating:So is this a kube-apiserver CPU and/or memory issue we are hitting here?
I am going to try to raise the limits in our various EKS clusters and perform a scale test again to see if it improves the situation in any way.
I am already stealing time at work and on my way to:
As mentioned before, I can open source the tool to create Secrets in parallel, but (as of now) cannot open source the KinD setup scripts as they are too detailed and try to replicate our cloud deployments to a large extent. But I can strip this down later if needed, and open source a subset of work. Wrt not testing with Kong, I can create a new operator, but would try to avoid the work and just try to repro things with Kong and try to see if we can do something.
Might be related: https://github.com/helm/helm/pull/10715, which we hit in the same cluster with huge amounts of Secrets, and that was solved by using proper pagination.
Logs are as in the discussion link:
It is not an OOM, although if I keep adding Secrets, it sometimes goes to OOM.
I have already evaluated
--watch-namespaces
option and it is unfortunately not an option for us as we need to watch all namespaces in the cluster.