rancher: kubernetes 1.22 conflicts with system-library-rancher-monitoring, cluster is in error state
Rancher Server Setup
- Rancher version: v2.6.3-patch2
- Installation option (Docker install/Helm Chart): Docker
Information about the Cluster
- Kubernetes version: v1.22.6-gke.300
- Cluster Type (Local/Downstream): Hosted, gke
User Information
- What is the role of the user logged in? Admin
Describe the bug
my 1.22.6-gke.300 cluster is in error state since upgrading from v2.5 to rancher v2.6, stating
Template system-library-rancher-monitoring incompatible with rancher version or cluster's [c-ab123] kubernetes version
legacy monitoring is disabled on all projects and the cluster, alerts, alert groups and notifiers are all deleted, cattle-prometheus
namespace is deleted. system-library
chart is on branch release-v2.6
Rancher logs show
[ERROR] error syncing 'system-library': handler system-image-upgrade-catalog-controller: upgrade cluster c-ab123 system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [c-ab123] kubernetes version, requeuing
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 31
- Comments: 85 (4 by maintainers)
Commits related to this issue
- Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to aiyengar2/rancher by aiyengar2 2 years ago
- Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to aiyengar2/rancher by aiyengar2 2 years ago
- Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to aiyengar2/rancher by aiyengar2 2 years ago
- Added Windows Custom Cluster Tests Added Windows Custom Cluster Tests Resolving PR issues Updated variable naming Remove rc from webhook min version 0.3.0-rc5 Add extra parameter for source regis... — committed to jameson-mcghee/rancher by jameson-mcghee 2 years ago
- Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to vivek-shilimkar/rancher by aiyengar2 2 years ago
- Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to vivek-shilimkar/rancher by aiyengar2 2 years ago
We had the same problem and found a workaround until someone at rancher will hopefully fix this:
In the local cluster where rancher is running:
kubectl get catalogtemplates system-library-rancher-monitoring -n cattle-global-data -o yaml > system-library-rancher-monitoring.yaml
kubectl edit catalogtemplates system-library-rancher-monitoring -n cattle-global-data
In the first list item under “spec.versions” edit the “kubeVersion: < 1.22.0-0” to something that matches your kubernetes version. We have set “kubeVersion: ‘>=1.21.0-0’”
Before
After
After editing this ressource the error messages have stopped immediately.
To fix the issue with the RKE cluster, follow these steps:
Use kubectl to edit the cluster configuration:
kubectl edit clusters.management.cattle <cluster_id>
Replace <cluster_id> with the ID of the cluster you want to edit.
Find the section that contains the error message:
Replace it with the following section:
Save the changes and exit the editor.
This should not have been closed. One of the problems is that a message is being logged over and over about something that we don’t care about, precisely because it isn’t supported. It’s still happening on my Rancher 2.6.9 instance. This started its life on 2.3.x and never had monitoring V1 enabled. The other problem has a separate ticket as noted above.
I was able to fix my same issue by disabling the legacy feature and restarting rancher. The option is located in Global settings -> Feature flags. After disabling it, the system-library-rancher-monitoring errors continue to populate the logs until Rancher is restarted.
Have exactly the same problem with upgrading docker version of rancher from 2.5.12 to 2.6.4
2022/04/07 07:06:01 [ERROR] error syncing 'system-library': handler system-image-upgrade-catalog-controller: upgrade cluster local system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [local] kubernetes version, requeuing W0407 07:06:53.762722 56 transport.go:288] Unable to cancel request for *client.addQuery 2022/04/07 07:08:02 [ERROR] error syncing 'system-library': handler system-image-upgrade-catalog-controller: upgrade cluster local system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [local] kubernetes version, requeuing
things are unchanged on v2.6.4
Dear @MKlimuszka please consider reopening the case since the original issue reporting clearly claims that
legacy monitoring is disabled on all projects on the cluster, alerts, alert groups and notifiers are all deleted
so the issue is not that Monitoring and logging v1 is not supported any more above 1.21, but that although they are all turned off an error is still logged on dashboardHow exactly do you clear the error in the UI after it has triggered? All the solutions I see so far only disables the error being logged. With the error triggered, the cluster is not actually selectable even though if you paste in the ID into the URL you can still access the Cluster Explorer. The cluster in question is not even using the legacy Prometheus monitoring. Running Rancher 2.7.1 and this cluster was upgraded to 1.24.10.
UPDATE I was able to clear the error by editing the cluster.
kubectl edit clusters.management.cattle.io c-z7kd2
Find the block for
PrometheusOperatorDeployed
. It will be inError
state. You will need to replace this block with something like this to reset the error in the UI.hit the same issue and since we don’t need the legacy catalog feature, so I simply removed it, and no errors any more
This solved it again for me: https://github.com/rancher/rancher/issues/37039#issuecomment-1176320933
same issue in 2.7 also
@onpaws we are using single instance docker image
I was just hit by this bug. When I attempted the fixes documented in the issue, I was having a hard time locating the right resource to edit, until I realized I needed to perform the
kubectl edit clusters.management.cattle.io <cluster-id>
from the “local” cluster. I hope this helps someone else who is just a newbie with rancher as well.thanks @chri4774 for the tip, indeed logs are gone with this tweak, but I still have this error state on my 1.22 cluster, could you get this gone as well?