cluster-api: Problem with v1alpha2 references
First off, huge shout out to @johnharris85 for finding this one and reaching out. We worked together to trace down what the problem is exactly. Strap in!
For background reading, please take a look at #2108.
Creating a cluster with the YAML in #2108 does not work. You will see the Cluster API machine controller getting stuck finding an empty dataSecretName. This is super weird because when we look at the bootstrap config object, the dataSecretName is populated correctly, but Cluster API reads it as an empty string.
Then we noticed the API version of the reference objects. The machine referenced a v1alpha2 type which it tries to fetch and read the dataSecretName from. You’ll see that the v1alpha2 type has no dataSecretName. The way we check for the value will return an empty string and a false value which is ignored today.
The problem lies in the fact that when we get the object we’ve stored on disk with kubectl or a client, we return an upconverted hub version v1alpha3. But when Cluster API controllers fetch the version, they fetch the real version the reference is using. In this case it was v1alpha2.
We tested this hypothesis and changed the references to v1alpha3 and recreated the cluster. We got past that error and into another error where the ControlPlaneEndpoint could not be read. This was again, the fault of a reference pointing to a v1alpha2 type.
Suggestions to fix this issue
- To mitigate the confusion, we should be checking for the false value that we are currently ignoring in most places where we call
unstructured.NestedString. - There are a few ways we could consider fixing this. One simple solution, but not great for users, would be to tell users that the references should all point to hub versions. This would allow us to fetch the hub version instead of a spoke version that may be behind the hub version.
- We could somehow fetch the hub version or perhaps run the spoke version manually through the conversion functions to create a hub version (but this seems very fragile).
- ??? I’m still mulling over other solutions since I don’t like any of these, but I suspect others here will have some thoughts 😄
cc @ncdc @vincepri @detiber @randomvariable @noamran
/kind bug /milestone v0.3.0 /priority important-soon
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 24 (24 by maintainers)
I’m leaning towards having the controller fail in this case, because all the objects should be updated somehow, but I would like probably to chat a sync chat and bridge our minds for a better solution
Ok, I think I’m overcomplicating things. I think you’re right, but it would probably be useful to chat on Zoom after office hours.
thanks, updated