cluster-api-provider-metal3: Support for Failure Domains in CAPM3
User Story
As an operator who has placed their baremetal infrastructure across different failure domains (FDs), I would like CAPM3 to associate Nodes with BMHs from the desired failure domain.
Detailed Description
CAPI supports failure domains for both control-plane and worker nodes (see CAPI provider contract for Provider Machine as well Provider Cluster types). Here is the general flow:
- CAPI will look for the set of FailureDomains in the
ProviderCluster.Spec
- The field is copied to the
Cluster.Status.FailureDomains
- During KCP or MD scale up events, a FD will be choosen from this set and it’s value placed in
Machine.Spec.FailureDomain
. Currently, CAPI tries to equally balance Machines across all FDs. - It is expected that providers will use this chosen FD in the
Machine.Spec
in deciding where to place the provider specific machine. In the case of metal3, we want CAPM3 to associate the Metal3Machine with the corresponding BMH in the desired FD.
BMH Selection using Labels.
- The operator labels the BMH resource based on the physical location of the host. For example, the following standard label could be used on the BMH:
infrastructure.cluster.x-k8s.io/failure-domain=<my-fd-1>
- Today, CAPM3
chooseHost()
func associates a Metal3Machine with a specific BMH based on labels specified inMetal3Machine.Spec.HostSelector.MatchLabels
. We can expand this capability. - The HostSelector field is used to narrow down the set of available BMHs that meet the selection criteria. When FDs are being utilized, we can simply insert the above label into the
HostSelector.MatchLabels
.
Anything else you would like to add:
Related issues: https://github.com/kubernetes-sigs/cluster-api/issues/5666 https://github.com/kubernetes-sigs/cluster-api/issues/5667
/kind feature
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 1
- Comments: 27 (23 by maintainers)
Hey @furkatgofurov7 @Arvinderpal, I’d like to give this one a shot if I may. I have a draft PR created at the moment, but still need to familiarize myself with the testing & polishing requirements of this repo. Hope I get time this week to make some more progress on it.
Hey, sorry for the delay on this one. It’s still on my todo list! It’s been busy for me lately, but I hope to get this tested sometime soon.
I’ll keep you posted! 😃