crossplane: Proposal: Break up large providers by service
What problem are you facing?
Crossplane installs a lot of CRDs. More specifically, some of the most widely used Crossplane providers install a lot of CRDs. Upbound’s Official AWS Provider for example currently installs almost 800 CRDs.
Per the CRD scaling one pager, Kubernetes and the tools in its ecosystem struggle when so many CRDs are installed. The API server uses a lot of memory (around 3MB per CRD). Many tools like kubectl
and libraries like client-go
are designed under the assumption that only 10s of CRDs will exist, and thus rely on inefficient queries. Crossplane maintainers and others (notably @jonnylangefeld and @apelisse) have made a lot of headway into improving this situation. Unfortunately it has become evident that we can’t wait for the ecosystem to catch up to our scale. Despite there being a handful of performance improvements with each new Kubernetes release it may take years for everyone to be running versions of Kubernetes and Kubernetes tooling that handles hundreds or thousands of CRDs well.
In addition to the performance issues that come with loading a lot of CRDs, per https://github.com/crossplane/crossplane/issues/2869, some folks have expressed security concerns about having to deploy APIs and controllers that they don’t plan to use. Others have stated they just plain don’t like the “bloat” of knowing that there’s a bunch of superfluous things installed.
Per the preceding issues, some folks have started rolling their own Crossplane providers that include only the types they need. These are typically large existing providers (like provider-aws
) forked and with the superfluous controllers and types removed.
How could Crossplane help solve your problem?
I propose that we break up large providers by service. That is, instead of provider-aws
we would have provider-aws-rds
, provider-aws-eks
, etc. The goal is to lower the ratio of installed-to-used CRDs, and intuitively doing that by service seems like about the right ratio to me. Thanks to the Crossplane package manager I believe it should be possible (perhaps with a few package manager tweaks) to make for example provider-aws
a “meta-package” that pulled in all of the smaller providers for backward compatibility.
I like this approach because:
- It’s opt-in, rather than opt-out. You install the things you do need, as opposed to turning off the things you don’t need.
- No changes would be required to the many providers that already have a low installed-to-used CRD ratio. Providers like
provider-terraform
andprovider-helm
that only have 2-3 CRDs can stay as they are. - It increases the granularity at which providers can be updated. For example if you had to stick with a certain version of
provider-aws-rds
you could still updateprovider-aws-eks
to the latest version. - It will likely increase the scalability of providers, by sharding work across multiple provider processes (i.e. Pods).
Some examples for context. Presuming that each provider mapped to an existing API group (e.g. provider-aws-rds
mapped to rds.aws.upbound.io
):
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 37
- Comments: 20 (12 by maintainers)
Commits related to this issue
- Proposal: Break Up Large Providers by Service Fixes #3754 This design document proposes that the 6-7 largest Crossplane providers be broken down into smaller, service-scoped ones. This would help fo... — committed to negz/crossplane by negz a year ago
- Proposal: Break Up Large Providers by Service Fixes #3754 This design document proposes that the 6-7 largest Crossplane providers be broken down into smaller, service-scoped ones. This would help fo... — committed to negz/crossplane by negz a year ago
- Proposal: Break Up Large Providers by Service Fixes #3754 This design document proposes that the 6-7 largest Crossplane providers be broken down into smaller, service-scoped ones. This would help fo... — committed to negz/crossplane by negz a year ago
- Proposal: Break Up Large Providers by Service Fixes #3754 This design document proposes that the 6-7 largest Crossplane providers be broken down into smaller, service-scoped ones. This would help fo... — committed to AndrewChubatiuk/crossplane by negz a year ago
Naively and from user experience perspective, I would say having a single provider with the ability to control CRDs - and so the resources to watch in the controller - via an allow list - default to
*
to keep backward compatibility - feels the most ideal. Controlling even to the specific CRD level would be nice like:buckets.s3.aws.upbound.io
or to get a whole group say*.s3.aws.upbound.io
. Managing providers from e.g. Flux it would be nice this way to simply extend / control what we use in given clusters by Crossplane adding the new CRDs and the new provider deployment starts watching them. One drawback here I could think of is removing a CRD / group from the allow list that actually cannot be removed, because let’s say it is being used and then should the provider deployment stop managing that? Might not be very difficult to - tho possibly resource intensive - validate the allow list from an admission controller.Introducing a bunch of version / package management could possibly lead to more overhead, incompatibility and hard to debug issues. A lot of new deployments / pods for sub-providers means harder to track / correlate what is going on based on the logs.
It would be nice to be able to maintain the ProviderConfigs at the cloud-provider level instead of the API grouping level. We have to maintain many different ProviderConfigs for different accounts, and duplicating those across the API-group-providers would not be pleasant. Can the API-group-providers be “children” of the larger “container” provider and inherit the ProviderConfigs associated with it?
The additional provider processes don’t concern me as they will all use (small) fractions of the resources that the larger providers do now. If we manage the same number of cloud resources across a dozen API group providers there will be some additional overhead but not enough to be concerned about.
Ideally we could use wildcards for versioning / groups, so that this is much less of a burden. Something like:
Consider a company that has gone all-in on using crossplane for managing infrastructure. The micro-controller based approach would lead to a very large system footprint if that company were to use multiple cloud (and …other…) crossplane providers.
I agree that from a user perspective it would be nicer to just deal with one
provider-aws
(that is oneProviderConfig
to install and ideally stay with one pod per provider, as there will be more idling resources ‘wasted’ if we run one pod per cloud service).But that made me think: can’t we make the on/off switches for services via changes in the already existing APIs? I’m thinking of the
Provider
resource. When you install a provider (even via CLI I believe) this resource is created and the crossplane core controller decides which CRDs to install and whichDeployment
(further configured by theControllerConfig
) to create.So in the
Provider
resource there could be a newservice
field that allows to filter the services that should be installed. This field should be optional to not influence providers that don’t need service filtering (like the helm and terraform providers) and also to maintain backwards compatibility.This
service
field could be used by the crossplane core controller to only install a subset of the CRDs in the package. It would also need to pass this information onto the actual providerDeployment
, as the actual provider binary would need to be configured to watch only a subset of service resources. This could be done either via env vars or binary flags passed onto theDeployment
.I think overall this approach would cover all the positive aspects that you mentioned initially, but also wouldn’t let the amount of providers explode. The marked place would stay more lean (still one provider per cloud) and services could be switched on and off more easily by the user (just add/remove to/from the list rather than installing a new ‘provider’). This is probably besides the point, but the current terminology ‘provider’ just fits better describing the whole cloud provider. But I think it makes total sense to have a separation by service inside that resource.
It feels like increase in the cognitive complexity is unavoidable but most of them can be addressed with better tooling or more changes. For example, I’m not too worried about declared dependencies of a
Configuration
package since we can expand that dependency declaration to include certain CRDs if we wanted to go with filtering. Similarly, proliferation ofProviderConfig
type in case of separate package option is reasonably avoidable with a common package dependency.However, the following increases in cognitive complexity seems unavoidable:
Configuration
being one of them.I don’t love the filtering option but feels like we have to pay some cost and the package-per-service option seems to bring more cognitive complexity especially in Day 2 operations compared to filtering - I’d like to avoid having a new version compatibility problem as much as possible.
(FWIW, there was a draft implementation of filtering a while back https://github.com/crossplane/crossplane/pull/2646 )
I was thinking through this today and realized this probably wouldn’t be possible. I believe the package manager wouldn’t let two providers share the same ProviderConfig CRD.
I don’t have an exact count yet but I’m estimating ~30 unique, if grouped by family (such as
*kafka.aws.upbound.io
orrds.aws.upbound.io
) more like 15-20.Greatly prefer the route of filtering installed CRDs along with enabling reconcilers within a single binary.
The ACK project went the route of 1:1 AWS-Service-to-Kubernetes-Controller-Process and IMO it has bloated the resource utilization of the system as a whole. Controller-based infrastructure systems already suffer from greater overhead when compared to point-in-time systems like terraform. This would further that divide.
IMO, following the model of the Kubernetes controller-manager with its bundled reconcilers leads a simpler and more efficient system.