rancher: Cluster monitoring cannot be restored once disabled

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

  • enable cluster monitoring
  • disable cluster monitoring
  • delete cattle-prometheus namespace and leftover apps
  • re-enable cluster monitoring

Result: only cluster-monitoring app is deployed, cluster-alerting and monitoring-operator doesn’t get deployed, leaving cluster monitoring in a half-ready state

Other details that may be helpful:

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): v2.2.0
  • Installation option (single install/HA): single

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): GKE
  • Machine type (cloud/VM/metal) and specifications (CPU/memory): cloud
  • Kubernetes version (use kubectl version):
(paste the output here)
  • Docker version (use docker version):
(paste the output here)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 38 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@doog33k @materemias @hmonadjem @vincent927

We will fix it in 2.2.2.

If you hit this issue in 2.2.0 or 2.2.1, currently the workaround is not easy. But it can get your cluster monitoring working again.

Workaround in 2.2.0 and 2.2.1 if you manually delete cattle-prometheus namespace.

  1. Disable cluster monitoring
  2. Go to System project -> Apps
  3. Remove cluster-monitoring and monitoring-operator apps if the apps cannot be deleted.
  4. docker exec into your rancher server container
  5. kubectl get cluster to find your cluster id (You might need to install vim in the container)
  6. kubectl edit cluster ${cluster_id}
  7. Delete the PrometheusOperatorDeployed condition and save the change. (Remove the three lines in the screenshort)
  8. Enable the cluster monitoring again image

@thxCode I have disabled all monitoring and inactivated all alerting in the cluster which is affected and tried to start over as you suggested but when I enable cluster monitoring I end up with the same. Rancher logs show this error

[ERROR] ClusterController c-th9ct [cluster-monitoring-handler] failed with : unable to sync Cluster c-th9ct(asdasd): failed to detect the installation status of monitoring components: Prometheus StatefulSet isn't deployed

same problem, is there a way to create the missing apps manually?

@doog33k @materemias @hmonadjem @vincent927

We will fix it in 2.2.2.

If you hit this issue in 2.2.0 or 2.2.1, currently the workaround is not easy. But it can get your cluster monitoring working again.

Workaround in 2.2.0 and 2.2.1 if you manually delete cattle-prometheus namespace.

  1. Disable cluster monitoring
  2. Go to System project -> Apps
  3. Remove cluster-monitoring and monitoring-operator apps if the apps cannot be deleted.
  4. docker exec into your rancher server container
  5. kubectl get cluster to find your cluster id (You might need to install vim in the container)
  6. kubectl edit cluster ${cluster_id}
  7. Delete the PrometheusOperatorDeployed condition and save the change. (Remove the three lines in the screenshort)
  8. Enable the cluster monitoring again image

@loganhz hi ,I tried to solve the similar problem of cluster alerting according to your method, but it did not succeed. Is there any way to solve it? There is nothing in the system cluster alerting app now.

image

Per @loganhz regular enable->disable->enable monitoring works. The issue happens only when operator is removed manually. Given the workaround, moving out of 2.2.2. Fix to address manual removal scenario will be done in subsequent maintenance release.

And is there a way I can get monitoring-operator back for now? (It is not listed under apps)