rancher: Cluster monitoring cannot be restored once disabled
What kind of request is this (question/bug/enhancement/feature request): bug
Steps to reproduce (least amount of steps as possible):
- enable cluster monitoring
- disable cluster monitoring
- delete cattle-prometheus namespace and leftover apps
- re-enable cluster monitoring
Result: only cluster-monitoring app is deployed, cluster-alerting and monitoring-operator doesn’t get deployed, leaving cluster monitoring in a half-ready state
Other details that may be helpful:
Environment information
- Rancher version (
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): v2.2.0 - Installation option (single install/HA): single
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): GKE
- Machine type (cloud/VM/metal) and specifications (CPU/memory): cloud
- Kubernetes version (use
kubectl version
):
(paste the output here)
- Docker version (use
docker version
):
(paste the output here)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 38 (8 by maintainers)
@doog33k @materemias @hmonadjem @vincent927
We will fix it in 2.2.2.
If you hit this issue in 2.2.0 or 2.2.1, currently the workaround is not easy. But it can get your cluster monitoring working again.
Workaround in 2.2.0 and 2.2.1 if you manually delete
cattle-prometheus
namespace.cluster-monitoring
andmonitoring-operator
apps if the apps cannot be deleted.kubectl get cluster
to find your cluster id (You might need to install vim in the container)kubectl edit cluster ${cluster_id}
PrometheusOperatorDeployed
condition and save the change. (Remove the three lines in the screenshort)@thxCode I have disabled all monitoring and inactivated all alerting in the cluster which is affected and tried to start over as you suggested but when I enable cluster monitoring I end up with the same. Rancher logs show this error
same problem, is there a way to create the missing apps manually?
@loganhz hi ,I tried to solve the similar problem of cluster alerting according to your method, but it did not succeed. Is there any way to solve it? There is nothing in the system cluster alerting app now.
Per @loganhz regular enable->disable->enable monitoring works. The issue happens only when operator is removed manually. Given the workaround, moving out of 2.2.2. Fix to address manual removal scenario will be done in subsequent maintenance release.
And is there a way I can get monitoring-operator back for now? (It is not listed under apps)