kubernetes: High CPU Utilization in kube-apiserver when a large number of CRDs are created

What happened?

In Crossplane, we would like to register thousands of CRDs in a single cluster representing the many Cloud services possibly from multiple Cloud Providers. In our early trials with kind clusters, we have observed that simultaneously posting just over 700 CRDs results in a CPU-utilization spike in kube-apiserver, due to the background OpenAPI spec publishing job. We were also made aware of the memory consumption issue during CRD creation.

We made similar observations in a GKE cluster on which we installed over 650 CRDs, making the total 685. Unfortunately, we could not identify the root cause but we have observed the cluster flipping between the “repairing” state and the “available” state for an extended period of time. Our observation is that the cluster could come out of the repairing state but for a couple of times, it was in repairing state again, with no new CRD registrations or object creations from our side. The only probes we utilized during this period were kubectl get commands.

In our tests, we have observed that on an arm64 machine with 8 CPU cores, API server enters a high CPU utilization period of about 20 min after 750+ CRDs are created. If the API server saturates the CPU resources allocated to it, this may also disrupt its services. So we have a couple of questions:

Can we expect the fix for the memory issue referenced above improve the situation with the CPU?
Are there any plans for introducing the number of CRDs in a cluster as a scalability dimension in the thresholds document? If so, is it reasonable to expect our use cases covered?
Is it possible/reasonable to turn off the /openapi/v2 endpoint to prevent aggregation of the OpenAPI schemas of the CRDs? Who are the clients of this endpoint?
In the GKE case, is there a way to monitor kube-apiserver CPU/memory consumption, crashes, etc.? So that we can reason about our empirical observations described above.

What did you expect to happen?

We were not expecting to observe CPU usage spikes for extended periods in our tests. For the GKE cluster experiment, we were not expecting service disruptions.

How can we reproduce it (as minimally and precisely as possible)?

Simultaneously create a large number (>650) of CRDs with OpenAPI schemas.

Anything else we need to know?

It looks like the average schema size in bytes per CRD could be a relevant parameter. If needed, we can supply those numbers assuming schemas are serialized in JSON. If you’d like to see some example CRDs we used in the above observations, please take a look at this branch of one of the Crossplane providers.

Kubernetes version

v1.21.0 in local tests where we observed high CPU utilization.
v1.20.10-gke.301 for GKE:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:11:29Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.10-gke.301", GitCommit:"17ad7bd6afa01033d7bd3f02ce5de56f940a915d", GitTreeState:"clean", BuildDate:"2021-08-24T05:18:54Z", GoVersion:"go1.15.15b5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

GKE

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (15 by maintainers)

Commits related to this issue

prometheus: add metrics about writes Add metrics to report the traffic towards CRDs generated by the RTEs. Even though issues like https://github.com/kubernetes/kubernetes/issues/105932 https://githu... — committed to k8stopologyawareschedwg/resource-topology-exporter by ffromani 2 years ago

Most upvoted comments

We probably should look into building proto from Go structure directly. With lazy marshaling, that would help us eliminate the json serialization completely (assuming everyone uses proto)

be careful about type mismatches that got covered up by serialization round tripping (int vs int64, empty list vs nil, etc) that would make deepequal checks fail

liggitt on Nov 3, 2021

Our initial tests show that https://github.com/kubernetes/kube-openapi/pull/251 is likely to fix (or greatly alleviate) this issue. See https://github.com/crossplane/crossplane/issues/2649#issuecomment-956914007 for details.

negz on Nov 2, 2021

Some potential improvements we can do:

the aggregation layer should download apiextension-apiserver openapi spec using proto instead of json
make how often the aggregation layer polls apiextension-apiserver configurable?

not having the openapi controller keep a map of all CRD schemas all the time, and merge the schemas together on every reconcile

I think it helps but merging the spec doesn’t play as a big part in the latency as the json (de)serialization. There were some scalability data we collected when CRD went GA (not sure if they still hold): https://github.com/kubernetes/enhancements/pull/1015#discussion_r305521857

roycaihw on Oct 29, 2021