cluster-api: Duplicate resources are created when template is used both as controlPlane ref and infrastructure ref in ClusterClass
What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.]
While doing a PoC of creating an EKS cluster using ClusterClass (CAPI + CAPA), I noticed that two AWSManangedControlPlane (awsmcp) are created from AWSManangedControlPlaneTemplate (awsmcpt) when there should be only one awsmcp. For context, EKS uses an AWS managed control plane so AWSManangedControlPlane in CAPA is the counterpart of KubeadmControlPlane in CAPI.
$ kubectl get cluster -n eks
NAME PHASE AGE VERSION
my-eks-cluster-2 Provisioned 2d20h v1.21.2
$ kubectl get awsmcp -n eks
NAME CLUSTER READY VPC BASTION IP
my-eks-cluster-2-84g7h my-eks-cluster-2 true vpc-0fd430763e64830ee
my-eks-cluster-2-c4vqb my-eks-cluster-2 true vpc-06b20d0d9d5eae93d
Further debugging suggests that this is due to the fact AWSManagedControlPlaneTemplate is used both asspec.controlPlane.ref and spec.infrastructure.ref in ClusterClass and CAPI controller clones them twice. FYI, there is no AWSManagedCluster type in CAPA.
ClusterClass used for EKS.
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: eks-clusterclass-v1
namespace: eks
spec:
controlPlane:
ref:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: AWSManagedControlPlaneTemplate
name: eks-clusterclass-v1-awsmcp
infrastructure:
ref:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: AWSManagedControlPlaneTemplate
name: eks-clusterclass-v1-awsmcp
workers:
machineDeployments:
- class: default-worker
template:
bootstrap:
ref:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: EKSConfigTemplate
name: eks-clusterclass-v1-eksconfigtemplate
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: eks-clusterclass-v1-worker-machinetemplate
This is creating an extra EKS control plane and related infrastructure in AWS and causing panics in CAPA controller.
What did you expect to happen: Only one AWSManangedControlPlane is created.
Anything else you would like to add:
Environment:
- Cluster-api version: v1.1.1
- Minikube/KIND version: v0.11.1
- Kubernetes version: (use
kubectl version): v1.21.2 - OS (e.g. from
/etc/os-release): MacOS
/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (24 by maintainers)
With managed Kubernetes services the lines are blurred between the cluster infrastructure and the control plane. So a few solutions that have been discussed:
infrastructureRefandcontrolPlaneRef(like CAPA is doing and the PR for CAPZ is proposing). But this currently causes problems with clusterclass as documented by this issueClusterandControlPlanekinds, even if theClusterkind only acts as a passthrough to satisfy the contract with CAPI. For CAPA this causes some strange interactions required for things like controlplane endpoint.infrastructureRefand not supplycontrolPlaneRefand do all the control plane reconciliation in the infrastructure cluster reconciler. From office hours there was mention that this could cause an issue with clusterclassSounds like we need to get a proposal/doc together that covers the various potential solutions and then decide a consistent way forward for any provider that has a managed kubernetes service.
@fabriziopandini @pydctw - i can make a start on a doc tomorrow and we can all collaborate. How does that sound?
I’m not surprised this turned out to be an issue.
We originally had 2 different kinds
AWSManagedClusterandAWSManagedControlPlane. HoweverAWSManagedClusterwas only acting as a pass-through to communicate values from the control plane to capi to satisfy theinfrastructure clustercontract…which the control plane already did. So we decided to remove it.I think it’s valid for managed services to have the Cluster/Controlplane as the same resource. But i also understand the need for consistency between providers and if we had to re-instate
AWSManagedClusterthen i’m good with that but……more generally, I’m not sure that we have ever thought about what CAPI looks like for managed Kubernetes services (please correct me if i’m wrong here)? As a result, providers have made their own decisions and have tried to fit it into the current resource kinds/reconciliation process/provider types. With the CAPG managed implementation starting soon, its probably something we need to discuss and decide on what a managed service looks like in CAPI.
Meeting to agree the responsibilities would be great 👍 Also agree that we need to be clear of the delineations and this is the issue we are facing with managed Kubernetes services…the current delineations don’t naturally fit.
This is a good example of why the current responsibilities of a control plane & infrastructure provider don’t fit well for managed services like EKS and why we have ended up where we are.
When you create a EKS control plane in AWS (which we do via a CAPI control plane provider) this creates a load balancer automatically for the API server…this is at odds with the current assumption that the infra provider creates the load balancer and reports the controlplane endoint back to CAPI.
So revisiting the provider types and responsibilities in the context of managed Kubernetes services would be great.
When shall we get together? Perhaps a doodle is needed?
~We should probably fix this bit in the future, a Cluster should always have a control plane, but it doesn’t have to have an infrastructure associated with it necessarily.~ See below
In the future we can also think that the infrastructure might go away entirely and instead become something else. Truthfully today the InfraCluster object is a stepping stone, we do need the infrastructure to be setup and configured somehow, but most users might want to have something else manage that (like Terraform, Crossplane, etc) and inform Cluster API where to get those values.
@pydctw How do we want to treat this core CAPI issue now that the proposal is merged? I assume we have corresponding issues in CAPA so it’s fine to close this issue here?
This issue is related to https://github.com/kubernetes-sigs/cluster-api/pull/6988 for EKS ClusterClass support, not server side apply. Will keep it open until the proposal is merged.
/reopen
Thinking about it a bit more, we should probably meet and define clear responsibilities of reference. Infrastructure, Control Plane, and other references should all have clear delineations.
If we think about the responsibilities of an infrastructure provider and its InfraCluster object, we can assume that this object provides the infrastructure for the Kubernetes Cluster, which can include a number of things and today it also includes the load balancer.
On the other side, the control plane reference is in charge of managing the Kubernetes control plane, the infrastructure should be left to the other reference. The challenge I’m seeing with the above is that it seems that we’ve mixed some of these responsibilities when it comes to the managed cloud Kubernetes services.
Let’s reconvene and chat more about it, we should meet with at least one person from each provider interested in this discussion and push for consistency and separation of responsibilities.
cc @yastij @fabriziopandini @sedefsavas @richardcase @CecileRobertMichon
It will be great if we can discuss it during the office hour tomorrow as I am blocked progressing on https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3166. Will add it to the agenda.
cc @sbueringer