kubernetes: High CPU Utilization in kube-apiserver when a large number of CRDs are created
What happened?
In Crossplane, we would like to register thousands of CRDs in a single cluster representing the many Cloud services possibly from multiple Cloud Providers. In our early trials with kind clusters, we have observed that simultaneously posting just over 700 CRDs results in a CPU-utilization spike in kube-apiserver, due to the background OpenAPI spec publishing job. We were also made aware of the memory consumption issue during CRD creation.
We made similar observations in a GKE cluster on which we installed over 650 CRDs, making the total 685. Unfortunately, we could not identify the root cause but we have observed the cluster flipping between the “repairing” state and the “available” state for an extended period of time. Our observation is that the cluster could come out of the repairing state but for a couple of times, it was in repairing state again, with no new CRD registrations or object creations from our side. The only probes we utilized during this period were kubectl get commands.
In our tests, we have observed that on an arm64 machine with 8 CPU cores, API server enters a high CPU utilization period of about 20 min after 750+ CRDs are created. If the API server saturates the CPU resources allocated to it, this may also disrupt its services. So we have a couple of questions:
- Can we expect the fix for the memory issue referenced above improve the situation with the CPU?
- Are there any plans for introducing the number of CRDs in a cluster as a scalability dimension in the thresholds document? If so, is it reasonable to expect our use cases covered?
- Is it possible/reasonable to turn off the
/openapi/v2endpoint to prevent aggregation of the OpenAPI schemas of the CRDs? Who are the clients of this endpoint? - In the GKE case, is there a way to monitor kube-apiserver CPU/memory consumption, crashes, etc.? So that we can reason about our empirical observations described above.
What did you expect to happen?
We were not expecting to observe CPU usage spikes for extended periods in our tests. For the GKE cluster experiment, we were not expecting service disruptions.
How can we reproduce it (as minimally and precisely as possible)?
Simultaneously create a large number (>650) of CRDs with OpenAPI schemas.
Anything else we need to know?
It looks like the average schema size in bytes per CRD could be a relevant parameter. If needed, we can supply those numbers assuming schemas are serialized in JSON. If you’d like to see some example CRDs we used in the above observations, please take a look at this branch of one of the Crossplane providers.
Kubernetes version
v1.20.10-gke.301 for GKE:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:11:29Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.10-gke.301", GitCommit:"17ad7bd6afa01033d7bd3f02ce5de56f940a915d", GitTreeState:"clean", BuildDate:"2021-08-24T05:18:54Z", GoVersion:"go1.15.15b5", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (15 by maintainers)
be careful about type mismatches that got covered up by serialization round tripping (int vs int64, empty list vs nil, etc) that would make deepequal checks fail
Our initial tests show that https://github.com/kubernetes/kube-openapi/pull/251 is likely to fix (or greatly alleviate) this issue. See https://github.com/crossplane/crossplane/issues/2649#issuecomment-956914007 for details.
Some potential improvements we can do:
I think it helps but merging the spec doesn’t play as a big part in the latency as the json (de)serialization. There were some scalability data we collected when CRD went GA (not sure if they still hold): https://github.com/kubernetes/enhancements/pull/1015#discussion_r305521857