cluster-operator: Status field should tell me about reconciliation failure

Is your feature request related to a problem? Please describe. In #171080093, we added a couple of conditions. As a result, we now have the infrastructure to implement all the other conditions. At the moment our CR status doesn’t report on reconciliation progress and potential failures during reconciliation.

Describe the solution you’d like In this story, let’s implement the conditions related to reconciliation as we discussed previously.

Conditions: Reconcilable:

Bubble up errors during reconcile. Irreconcilable CR, Validation failures
True state is the happy state.

InProgress (represents if the cluster is being reconciled) as discussed, InProgress will be addressed in a separate issue

Add these in the status field of the Cluster CRD. Set them during reconciliation.

For each condition above:

Scenario: Reconcilable Happy path 😀
Given that I deploy a RMQ cluster
And there are no reconciliation failures for any k8s child resource
When I describe the RabbitMQCluster CR
Then I see the condition `Reconcilable` in the status field set to `True`

Scenario: Reconcilable Unhappy path 🙁
Given that I deploy a RMQ cluster
And a k8s child resource fails to reconcile
When I describe the RabbitMQCluster CR
Then I see the `Reconcilable` condition in the status field set to `False`
And the condition has a reason and a message field explaining the conditition status

Describe alternatives you’ve considered See design document.

Additional context

The conditions should follow the Kubernetes API conventions.

This document has the conclusions of our team meeting and how we want the conditions to look like.

Edit: Reconcilable condition happy state is True. It was False before.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15 (6 by maintainers)

Most upvoted comments

Are we in agreement that this is a good example of an error that will NOT be captured by the condition we are envisioning here?

Yes, we are in agreement. I was dropping context from my exploration. At some point, like you mentioned, we should do something about that scenario like bubbling up the error, definitely in a separate issue or story.

Zerpet on May 27, 2020

@Zerpet Brought up a good example around of such error around persistent volume claims. A similar error could be changing the service type (only certain combinations). This is not about “syntax errors for RabbitmqCluster”.

We do currently have mitigations for the two scenarios that you are talking about. To update service from NodePort to ClusterIP/LB type, we remove a leftover property from the svc object for the update to succeed. We are also purposely ignoring any update that is not allowed by statefulSet (only updates to ‘replicas’, ‘template’, and ‘updateStrategy’ are allowed for sts), which means skipping updates on spec.persistence.storage. I think the current strategy makes sense with our crd spec structure at the moment, and I am not aware any other updates that would result in similar errors with our current crd. If you don’t mind, I suggest we pick this issue up later and implement it together with or after the crd spec refactor so that we are not solving an preemptive problem. What do you think? Am I missing something?

ChunyiLyu on Mar 19, 2020

As agreed in the sync-up, we will modify ReconcileFailure to be Reconcilable and constrain this specifically to errors generated by CRUD requests to the kube-api (i.e. errors from the CreateOrUpdate function).

We will pause work on the InProgress condition and consider writing a new issue to cover monitoring the replica status of the StatefulSet to determine if an upgrade is in progress

j4mcs on Mar 17, 2020