kubernetes: Support running multiple (out-of-tree) CCMs concurrently and conflict-free
What would you like to be added:
I’d like to be able to run multiple out-of-tree cloud controller managers (CCMs) concurrently without conflict. That is, it should be possible to use resources like nodes and load balancers and have exactly one of many running CCMs handle them while others would ignore them.
Why is this needed:
My primary use case is the ability to end-to-end-test changes to a CCM implementation in a cluster that already comes with an existing (usually released) CCM. The latter is a consequence of the fact that it is much easier (and suitable) to spin up a cluster for the target cloud of my CCM using cloud-specific tooling (e.g., doctl in the case of DigitalOcean) than it is to create a vanilla cluster with generic tooling (e.g., kubeadm or kops). However, the CCM instance that comes pre-built in my production cluster prevents me from testing that CCM-under-test instance because the current architecture assumes a single out-of-tree CCM to run.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 32 (19 by maintainers)
Outcome of today’s SIG meeting: I am going to put together a PoC PR to showcase what an implementation could look like in order to guide the discussion into a proper KEP.
We discussed using the cluster name parameter that is passed into the various cloud provider interface methods today in order to define ownership of resources for concurrently running CCM instances. The parameter probably isn’t an ideal fit long-term – it can be confusing to use different cluster name values for essentially the same cluster just so that we can distinguish which CCM owns certain resources. However, just for the purpose of demonstrating how multiple CCMs would reconcile or refrain from processing certain resources, it might be the easiest way to go. Long-term we should look into defining an annotation or a label to express the relationship between a set of resources and CCMs.
Sorry, I have not been able to work on this myself. Moreover, my org is probably going into the direction of leveraging Cluster API to create clusters on specific CCM versions, which we find more suitable for our case.
If anyone else wants to pick up this issue, feel free to go for it.
/remove-lifecycle rotten
Assigned myself the ticket, but @timoreimann will be working on it.