cluster-operator: Status field should tell me about reconciliation failure
Is your feature request related to a problem? Please describe. In #171080093, we added a couple of conditions. As a result, we now have the infrastructure to implement all the other conditions. At the moment our CR status doesn’t report on reconciliation progress and potential failures during reconciliation.
Describe the solution you’d like In this story, let’s implement the conditions related to reconciliation as we discussed previously.
Conditions: Reconcilable:
- Bubble up errors during reconcile. Irreconcilable CR, Validation failures
- True state is the happy state.
InProgress (represents if the cluster is being reconciled) as discussed, InProgress will be addressed in a separate issue
Add these in the status field of the Cluster CRD. Set them during reconciliation.
For each condition above:
Scenario: Reconcilable Happy path 😀
Given that I deploy a RMQ cluster
And there are no reconciliation failures for any k8s child resource
When I describe the RabbitMQCluster CR
Then I see the condition `Reconcilable` in the status field set to `True`
Scenario: Reconcilable Unhappy path 🙁
Given that I deploy a RMQ cluster
And a k8s child resource fails to reconcile
When I describe the RabbitMQCluster CR
Then I see the `Reconcilable` condition in the status field set to `False`
And the condition has a reason and a message field explaining the conditition status
Describe alternatives you’ve considered See design document.
Additional context
The conditions should follow the Kubernetes API conventions.
This document has the conclusions of our team meeting and how we want the conditions to look like.
Edit: Reconcilable condition happy state is True. It was False before.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (6 by maintainers)
Yes, we are in agreement. I was dropping context from my exploration. At some point, like you mentioned, we should do something about that scenario like bubbling up the error, definitely in a separate issue or story.
We do currently have mitigations for the two scenarios that you are talking about. To update service from NodePort to ClusterIP/LB type, we remove a leftover property from the svc object for the update to succeed. We are also purposely ignoring any update that is not allowed by statefulSet (only updates to ‘replicas’, ‘template’, and ‘updateStrategy’ are allowed for sts), which means skipping updates on
spec.persistence.storage
. I think the current strategy makes sense with our crd spec structure at the moment, and I am not aware any other updates that would result in similar errors with our current crd. If you don’t mind, I suggest we pick this issue up later and implement it together with or after the crd spec refactor so that we are not solving an preemptive problem. What do you think? Am I missing something?As agreed in the sync-up, we will modify
ReconcileFailure
to beReconcilable
and constrain this specifically to errors generated by CRUD requests to the kube-api (i.e. errors from theCreateOrUpdate
function).We will pause work on the
InProgress
condition and consider writing a new issue to cover monitoring the replica status of the StatefulSet to determine if an upgrade is in progress