rancher: Cluster monitoring cannot be restored once disabled

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

enable cluster monitoring
disable cluster monitoring
delete cattle-prometheus namespace and leftover apps
re-enable cluster monitoring

Result: only cluster-monitoring app is deployed, cluster-alerting and monitoring-operator doesn’t get deployed, leaving cluster monitoring in a half-ready state

Other details that may be helpful:

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): v2.2.0
Installation option (single install/HA): single

Cluster information

Cluster type (Hosted/Infrastructure Provider/Custom/Imported): GKE
Machine type (cloud/VM/metal) and specifications (CPU/memory): cloud
Kubernetes version (use kubectl version):

(paste the output here)

Docker version (use docker version):

(paste the output here)

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 3
Comments: 38 (8 by maintainers)

Commits related to this issue

Redeploy Operator when accidentally deleted **Problem:** False deletion of Operator workload causes the redeployment of Cluster Monitoring to fail. **Solution:** - Check Operator workload isn't aliv... — committed to rancher/rancher by deleted user 5 years ago

Most upvoted comments

@doog33k @materemias @hmonadjem @vincent927

We will fix it in 2.2.2.

If you hit this issue in 2.2.0 or 2.2.1, currently the workaround is not easy. But it can get your cluster monitoring working again.

Workaround in 2.2.0 and 2.2.1 if you manually delete cattle-prometheus namespace.

Disable cluster monitoring
Go to System project -> Apps
Remove cluster-monitoring and monitoring-operator apps if the apps cannot be deleted.
docker exec into your rancher server container
kubectl get cluster to find your cluster id (You might need to install vim in the container)
kubectl edit cluster ${cluster_id}
Delete the PrometheusOperatorDeployed condition and save the change. (Remove the three lines in the screenshort)
Enable the cluster monitoring again

+19

loganhz on Apr 3, 2019

@thxCode I have disabled all monitoring and inactivated all alerting in the cluster which is affected and tried to start over as you suggested but when I enable cluster monitoring I end up with the same. Rancher logs show this error

[ERROR] ClusterController c-th9ct [cluster-monitoring-handler] failed with : unable to sync Cluster c-th9ct(asdasd): failed to detect the installation status of monitoring components: Prometheus StatefulSet isn't deployed

materemias on Mar 30, 2019

same problem, is there a way to create the missing apps manually?

hmonadjem on Mar 29, 2019

@doog33k @materemias @hmonadjem @vincent927

We will fix it in 2.2.2.

If you hit this issue in 2.2.0 or 2.2.1, currently the workaround is not easy. But it can get your cluster monitoring working again.

Workaround in 2.2.0 and 2.2.1 if you manually delete cattle-prometheus namespace.

Disable cluster monitoring

Go to System project -> Apps

Remove cluster-monitoring and monitoring-operator apps if the apps cannot be deleted.

docker exec into your rancher server container

kubectl get cluster to find your cluster id (You might need to install vim in the container)

kubectl edit cluster ${cluster_id}

Delete the PrometheusOperatorDeployed condition and save the change. (Remove the three lines in the screenshort)

Enable the cluster monitoring again

@loganhz hi ,I tried to solve the similar problem of cluster alerting according to your method, but it did not succeed. Is there any way to solve it? There is nothing in the system cluster alerting app now.

vincent927 on Jun 13, 2019

Per @loganhz regular enable->disable->enable monitoring works. The issue happens only when operator is removed manually. Given the workaround, moving out of 2.2.2. Fix to address manual removal scenario will be done in subsequent maintenance release.

alena1108 on Apr 3, 2019

And is there a way I can get monitoring-operator back for now? (It is not listed under apps)

materemias on Mar 30, 2019