dashboard: Display failed shoot constraints

What would you like to be added:

In the shoot status, gardener already publishes the so called “constraints”, e.g.

status:
  constraints:
    - type: HibernationPossible
      status: 'False'
      lastTransitionTime: '2021-12-08T08:59:44Z'
      lastUpdateTime: '2021-12-08T02:56:17Z'
      reason: ProblematicWebhooks
      message: >-
        ValidatingWebhookConfiguration "opa-validating-webhook" is problematic:
        webhook "validating-webhook.openpolicyagent.org" with failurePolicy
        "Ignore" and 30s timeout might prevent worker nodes from properly
        joining the shoot cluster
    - type: MaintenancePreconditionsSatisfied
      status: 'False'
      lastTransitionTime: '2021-12-08T08:59:44Z'
      lastUpdateTime: '2021-12-08T02:56:17Z'
      reason: ProblematicWebhooks
      message: >-
        ValidatingWebhookConfiguration "opa-validating-webhook" is problematic:
        webhook "validating-webhook.openpolicyagent.org" with failurePolicy
        "Ignore" and 30s timeout might prevent worker nodes from properly
        joining the shoot cluster

Also see documentation.

It would be good to prominently display failed constraints in the shoot’s details page.

Why is this needed:

Often times problematic webhook configurations and similar might be the cause for other problems in the cluster (e.g. worker nodes not joining the cluster), that are visible in the dashboard e.g. in the health checks.

  • When operators start investigating such issues, it would be helpful to make them aware early on about the failed constraints, because it might speed up the process of investigation.
  • When users notice such issues, they might be able to help themselves already by looking at the failed constraint’s messages.

/kind enhancement /area ops-productivity

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 28 (28 by maintainers)

Most upvoted comments

Can you add the link to the best practices (https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#best-practices-and-warnings) so that the end-users have a chance to check what might be wrong?

As for the error code, please open an issue at g/g.

We could do something like this

Screen Shot 2022-04-05 at 18 25 58

Is this prominent enough? IDK… users already ignored the warning but maybe a red error with user error icon will help to make them aware. If we want this (or something similar) we need to talk about the texts as well as the implementation. But first let’s clarify if this is the direction we want to go.

Don’t get confused by the error message (Shoot cluster has been hibernated.) - I had no cluster with this error and I faked it.