rancher: Monitoring - Rancher failed to install monitoring after upgrading from 2.1.7 to 2.2.0 in Airgap setup
What kind of request is this (question/bug/enhancement/feature request): Bug
Steps to reproduce (least amount of steps as possible): ** Install single node Rancher server 2.1.7 using Airgap Installation (Server has access to internet) ** Create a cluster with nodes ** Upgrade to 2.2.0 using airgap installation ** Enable Cluster level monitoring
Result:
[main] 2019/03/29 16:06:04 Starting Tiller v2.10+unreleased (tls=false)
[main] 2019/03/29 16:06:04 GRPC listening on :52444
[main] 2019/03/29 16:06:04 Probes listening on :54817
[main] 2019/03/29 16:06:04 Storage driver is ConfigMap
[main] 2019/03/29 16:06:04 Max history per release is 0
[main] 2019/03/29 16:06:04 Starting Tiller v2.10+unreleased (tls=false)
[main] 2019/03/29 16:06:04 GRPC listening on :53804
[main] 2019/03/29 16:06:04 Probes listening on :46060
[main] 2019/03/29 16:06:04 Storage driver is ConfigMap
[main] 2019/03/29 16:06:04 Max history per release is 0
[tiller] 2019/03/29 16:06:05 getting history for release cluster-monitoring
[storage] 2019/03/29 16:06:05 getting release history for "cluster-monitoring"
Release "cluster-monitoring" does not exist. Installing it now.
[tiller] 2019/03/29 16:06:05 getting history for release monitoring-operator
[storage] 2019/03/29 16:06:05 getting release history for "monitoring-operator"
[tiller] 2019/03/29 16:06:05 preparing install for cluster-monitoring
[storage] 2019/03/29 16:06:05 getting release history for "cluster-monitoring"
Release "monitoring-operator" does not exist. Installing it now.
[tiller] 2019/03/29 16:06:05 preparing install for monitoring-operator
[storage] 2019/03/29 16:06:05 getting release history for "monitoring-operator"
[tiller] 2019/03/29 16:06:05 rendering rancher-monitoring chart using values
[tiller] 2019/03/29 16:06:05 rendering rancher-monitoring chart using values
[tiller] 2019/03/29 16:06:05 performing install for monitoring-operator
[tiller] 2019/03/29 16:06:05 executing 0 crd-install hooks for monitoring-operator
[tiller] 2019/03/29 16:06:05 hooks complete for crd-install monitoring-operator
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/rbac.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/deployment.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/charts/grafana/templates/rbac.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/charts/grafana/templates/pvc.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/servicemonitor.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/metrics-service.yaml" is empty. Skipping.
[tiller] 2019/03/29 16:06:05 performing install for cluster-monitoring
[tiller] 2019/03/29 16:06:05 executing 0 crd-install hooks for cluster-monitoring
[tiller] 2019/03/29 16:06:05 hooks complete for crd-install cluster-monitoring
[tiller] 2019/03/29 16:06:06 failed install perform step: validation failed: unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
2019/03/29 16:06:06 [ERROR] AppController p-s7js9/monitoring-operator [helm-controller] failed with : failed to install app monitoring-operator. Error: validation failed: unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
[tiller] 2019/03/29 16:06:06 failed install perform step: validation failed: [unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]
2019/03/29 16:06:06 [ERROR] AppController p-s7js9/cluster-monitoring [helm-controller] failed with : failed to install app cluster-monitoring. Error: validation failed: [unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" inversion "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize"": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]
Other details that may be helpful: All Rancher related images are in our private repository
Environment information
- Rancher version (
rancher/rancher/rancher/serverimage tag or shown bottom left in the UI): 2.2.0 - Installation option (single install/HA): single install
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Custom
- Machine type (cloud/VM/metal) and specifications (CPU/memory): VM and 4 Cores/8GB RAM
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.6", GitCommit:"ab91afd7062d4240e95e51ac00a18bd58fddd365", GitTreeState:"clean", BuildDate:"2019-02-26T12:59:46Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
- Docker version (use
docker version):
Client:
Version: 18.06.0-ce
API version: 1.38
Go version: go1.10.3
Git commit: 0ffa825
Built: Wed Jul 18 19:08:18 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.0-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: 0ffa825
Built: Wed Jul 18 19:10:42 2018
OS/Arch: linux/amd64
Experimental: false
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (2 by maintainers)
Commits related to this issue
- Add prometheus operator-init with CRDs configs Related Issues: https://github.com/rancher/rancher/issues/19362 — committed to guangbochen/system-charts by guangbochen 5 years ago
- Add prometheus operator-init with CRDs configs Related Issues: https://github.com/rancher/rancher/issues/19362 — committed to guangbochen/system-charts by guangbochen 5 years ago
- Add prometheus operator-init with CRDs configs Related Issues: https://github.com/rancher/rancher/issues/19362 — committed to guangbochen/system-charts by guangbochen 5 years ago
Looks like I’ve got the same issue with Rancher
v2.2.4as soon as I’ve set"enableClusterMonitoring":truein the cluster config. Seems like some race as if it tries to deploy the monitoring too soon right after (or during?) the cluster deployment. (Although, it works fine when I am enabling it manually in Web UI.) I have an automated Rancher deployment by Terraform.Full log from the Rancher server: https://gist.github.com/arno01/ed647eb5e84863f1d9c2cfe0e99cbdf7 Disabling and enabling the monitoring back at the
https://rancher/c/<cluster-id>/monitoring/cluster-settingmakes it work.You’re right, the cluster-agent pod could not resolve the rancher server hostname. I re-created the cluster after fixing the network. Apologies, monitoring is successfully installed now, very nice.
It looks like the Prometheus crd is not installed successfully. I think your cluster agent is suffering some network issue. A workaround should be delete the pod manually and it will recreate.
Is the cluster-agent version correct? Can you look at the log in cluster-agent pod under System project ?