rancher: kubernetes 1.22 conflicts with system-library-rancher-monitoring, cluster is in error state

Rancher Server Setup

Rancher version: v2.6.3-patch2
Installation option (Docker install/Helm Chart): Docker

Information about the Cluster

Kubernetes version: v1.22.6-gke.300
Cluster Type (Local/Downstream): Hosted, gke

User Information

What is the role of the user logged in? Admin

Describe the bug my 1.22.6-gke.300 cluster is in error state since upgrading from v2.5 to rancher v2.6, stating Template system-library-rancher-monitoring incompatible with rancher version or cluster's [c-ab123] kubernetes version legacy monitoring is disabled on all projects and the cluster, alerts, alert groups and notifiers are all deleted, cattle-prometheus namespace is deleted. system-library chart is on branch release-v2.6

Rancher logs show [ERROR] error syncing 'system-library': handler system-image-upgrade-catalog-controller: upgrade cluster c-ab123 system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [c-ab123] kubernetes version, requeuing

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 31
Comments: 85 (4 by maintainers)

Commits related to this issue

Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to aiyengar2/rancher by aiyengar2 2 years ago
Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to aiyengar2/rancher by aiyengar2 2 years ago
Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to aiyengar2/rancher by aiyengar2 2 years ago
Added Windows Custom Cluster Tests Added Windows Custom Cluster Tests Resolving PR issues Updated variable naming Remove rc from webhook min version 0.3.0-rc5 Add extra parameter for source regis... — committed to jameson-mcghee/rancher by jameson-mcghee 2 years ago
Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to vivek-shilimkar/rancher by aiyengar2 2 years ago
Avoid printing spammy logs when a Kubernetes version does not have legacy system tools available for install Instead of continously emitting logs with the following format: ```log 2022/12/14 21:53:5... — committed to vivek-shilimkar/rancher by aiyengar2 2 years ago

Most upvoted comments

We had the same problem and found a workaround until someone at rancher will hopefully fix this:

In the local cluster where rancher is running:

Make a backup of catalogtemplates/system-library-rancher-monitoring ressource kubectl get catalogtemplates system-library-rancher-monitoring -n cattle-global-data -o yaml > system-library-rancher-monitoring.yaml
Edit the catalogtemplates/system-library-rancher-monitoring ressource kubectl edit catalogtemplates system-library-rancher-monitoring -n cattle-global-data

In the first list item under “spec.versions” edit the “kubeVersion: < 1.22.0-0” to something that matches your kubernetes version. We have set “kubeVersion: ‘>=1.21.0-0’”

Before

apiVersion: management.cattle.io/v3
kind: CatalogTemplate
metadata:
  creationTimestamp: "2021-08-05T07:15:46Z"
  generation: 4
  labels:
    catalog.cattle.io/name: system-library
  name: system-library-rancher-monitoring
  namespace: cattle-global-data
  resourceVersion: "183697747"
  uid: b68405eb-3973-4baa-b887-6f973ad3dc61
spec:
  catalogId: system-library
  defaultVersion: 0.3.2
  description: Provides monitoring for Kubernetes which is maintained by Rancher 2.
  displayName: rancher-monitoring
  folderName: rancher-monitoring
  icon: https://coreos.com/sites/default/files/inline-images/Overview-prometheus_0.png
  projectURL: https://github.com/coreos/prometheus-operator
  versions:
  - digest: 08fbaee28d5a0efb79db02d9372629e2
    externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
    kubeVersion: < 1.22.0-0
    rancherMinVersion: 2.6.1-alpha1
    version: 0.3.2
    versionDir: charts/rancher-monitoring/v0.3.2
    versionName: rancher-monitoring
...

After

apiVersion: management.cattle.io/v3
kind: CatalogTemplate
metadata:
  creationTimestamp: "2021-08-05T07:15:46Z"
  generation: 4
  labels:
    catalog.cattle.io/name: system-library
  name: system-library-rancher-monitoring
  namespace: cattle-global-data
  resourceVersion: "183697747"
  uid: b68405eb-3973-4baa-b887-6f973ad3dc61
spec:
  catalogId: system-library
  defaultVersion: 0.3.2
  description: Provides monitoring for Kubernetes which is maintained by Rancher 2.
  displayName: rancher-monitoring
  folderName: rancher-monitoring
  icon: https://coreos.com/sites/default/files/inline-images/Overview-prometheus_0.png
  projectURL: https://github.com/coreos/prometheus-operator
  versions:
  - digest: 08fbaee28d5a0efb79db02d9372629e2
    externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
    kubeVersion: '>=1.21.0-0'
    rancherMinVersion: 2.6.1-alpha1
    version: 0.3.2
    versionDir: charts/rancher-monitoring/v0.3.2
    versionName: rancher-monitoring
...

After editing this ressource the error messages have stopped immediately.

+46

chri4774 on Jul 6, 2022

To fix the issue with the RKE cluster, follow these steps:

Use kubectl to edit the cluster configuration:

kubectl edit clusters.management.cattle <cluster_id>

Replace <cluster_id> with the ID of the cluster you want to edit.

Find the section that contains the error message:

- lastUpdateTime: "2023-03-28T03:29:01Z"
  message: template system-library-rancher-monitoring incompatible with rancher
    version or cluster's [c-qmh8k] kubernetes version
  reason: Error
  status: "False"

Replace it with the following section:

- lastUpdateTime: "2023-03-28T03:29:01Z"
  status: "True"
  type: PrometheusOperatorDeployed

Save the changes and exit the editor.

gmanera on Apr 10, 2023

This should not have been closed. One of the problems is that a message is being logged over and over about something that we don’t care about, precisely because it isn’t supported. It’s still happening on my Rancher 2.6.9 instance. This started its life on 2.3.x and never had monitoring V1 enabled. The other problem has a separate ticket as noted above.

gizmotronic on Nov 7, 2022

I was able to fix my same issue by disabling the legacy feature and restarting rancher. The option is located in Global settings -> Feature flags. After disabling it, the system-library-rancher-monitoring errors continue to populate the logs until Rancher is restarted.

matthewalanpenning on Oct 1, 2022

Have exactly the same problem with upgrading docker version of rancher from 2.5.12 to 2.6.4 2022/04/07 07:06:01 [ERROR] error syncing 'system-library': handler system-image-upgrade-catalog-controller: upgrade cluster local system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [local] kubernetes version, requeuing W0407 07:06:53.762722 56 transport.go:288] Unable to cancel request for *client.addQuery 2022/04/07 07:08:02 [ERROR] error syncing 'system-library': handler system-image-upgrade-catalog-controller: upgrade cluster local system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster's [local] kubernetes version, requeuing

tarakanof on Apr 7, 2022

things are unchanged on v2.6.4

materemias on Apr 2, 2022

Dear @MKlimuszka please consider reopening the case since the original issue reporting clearly claims that legacy monitoring is disabled on all projects on the cluster, alerts, alert groups and notifiers are all deleted so the issue is not that Monitoring and logging v1 is not supported any more above 1.21, but that although they are all turned off an error is still logged on dashboard

materemias on Oct 12, 2022

How exactly do you clear the error in the UI after it has triggered? All the solutions I see so far only disables the error being logged. With the error triggered, the cluster is not actually selectable even though if you paste in the ID into the URL you can still access the Cluster Explorer. The cluster in question is not even using the legacy Prometheus monitoring. Running Rancher 2.7.1 and this cluster was upgraded to 1.24.10.

UPDATE I was able to clear the error by editing the cluster.

kubectl edit clusters.management.cattle.io c-z7kd2

Find the block for PrometheusOperatorDeployed. It will be in Error state. You will need to replace this block with something like this to reset the error in the UI.

  - lastUpdateTime: "2023-04-09T09:05:55Z"
    status: "True"
    type: PrometheusOperatorDeployed

eroji on Apr 9, 2023

hit the same issue and since we don’t need the legacy catalog feature, so I simply removed it, and no errors any more

kubectl get catalogs system-library -o yaml > system-library.yaml
kubectl delete -f system-library.yaml
kubectl rollout restart -n cattle-system deploy/rancher

fengxx on Feb 16, 2023

This solved it again for me: https://github.com/rancher/rancher/issues/37039#issuecomment-1176320933

h0jeZvgoxFepBQ2C on Feb 7, 2023

same issue in 2.7 also

lkjangir on Jan 12, 2023

@onpaws we are using single instance docker image

mshivanna on Apr 20, 2022

I was just hit by this bug. When I attempted the fixes documented in the issue, I was having a hard time locating the right resource to edit, until I realized I needed to perform the kubectl edit clusters.management.cattle.io <cluster-id> from the “local” cluster. I hope this helps someone else who is just a newbie with rancher as well.

ahazelwood on Jul 22, 2023

thanks @chri4774 for the tip, indeed logs are gone with this tweak, but I still have this error state on my 1.22 cluster, could you get this gone as well?

materemias on Jul 8, 2022