rancher: Monitoring - Rancher failed to install monitoring after upgrading from 2.1.7 to 2.2.0 in Airgap setup
What kind of request is this (question/bug/enhancement/feature request): Bug
Steps to reproduce (least amount of steps as possible): ** Install single node Rancher server 2.1.7 using Airgap Installation (Server has access to internet) ** Create a cluster with nodes ** Upgrade to 2.2.0 using airgap installation ** Enable Cluster level monitoring
Result:
[main] 2019/03/29 16:06:04 Starting Tiller v2.10+unreleased (tls=false)
[main] 2019/03/29 16:06:04 GRPC listening on :52444
[main] 2019/03/29 16:06:04 Probes listening on :54817
[main] 2019/03/29 16:06:04 Storage driver is ConfigMap
[main] 2019/03/29 16:06:04 Max history per release is 0
[main] 2019/03/29 16:06:04 Starting Tiller v2.10+unreleased (tls=false)
[main] 2019/03/29 16:06:04 GRPC listening on :53804
[main] 2019/03/29 16:06:04 Probes listening on :46060
[main] 2019/03/29 16:06:04 Storage driver is ConfigMap
[main] 2019/03/29 16:06:04 Max history per release is 0
[tiller] 2019/03/29 16:06:05 getting history for release cluster-monitoring
[storage] 2019/03/29 16:06:05 getting release history for "cluster-monitoring"
Release "cluster-monitoring" does not exist. Installing it now.
[tiller] 2019/03/29 16:06:05 getting history for release monitoring-operator
[storage] 2019/03/29 16:06:05 getting release history for "monitoring-operator"
[tiller] 2019/03/29 16:06:05 preparing install for cluster-monitoring
[storage] 2019/03/29 16:06:05 getting release history for "cluster-monitoring"
Release "monitoring-operator" does not exist. Installing it now.
[tiller] 2019/03/29 16:06:05 preparing install for monitoring-operator
[storage] 2019/03/29 16:06:05 getting release history for "monitoring-operator"
[tiller] 2019/03/29 16:06:05 rendering rancher-monitoring chart using values
[tiller] 2019/03/29 16:06:05 rendering rancher-monitoring chart using values
[tiller] 2019/03/29 16:06:05 performing install for monitoring-operator
[tiller] 2019/03/29 16:06:05 executing 0 crd-install hooks for monitoring-operator
[tiller] 2019/03/29 16:06:05 hooks complete for crd-install monitoring-operator
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/rbac.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/deployment.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/charts/grafana/templates/rbac.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/charts/grafana/templates/pvc.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/servicemonitor.yaml" is empty. Skipping.
2019/03/29 16:06:05 info: manifest "rancher-monitoring/templates/metrics-service.yaml" is empty. Skipping.
[tiller] 2019/03/29 16:06:05 performing install for cluster-monitoring
[tiller] 2019/03/29 16:06:05 executing 0 crd-install hooks for cluster-monitoring
[tiller] 2019/03/29 16:06:05 hooks complete for crd-install cluster-monitoring
[tiller] 2019/03/29 16:06:06 failed install perform step: validation failed: unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
2019/03/29 16:06:06 [ERROR] AppController p-s7js9/monitoring-operator [helm-controller] failed with : failed to install app monitoring-operator. Error: validation failed: unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
[tiller] 2019/03/29 16:06:06 failed install perform step: validation failed: [unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]
2019/03/29 16:06:06 [ERROR] AppController p-s7js9/cluster-monitoring [helm-controller] failed with : failed to install app cluster-monitoring. Error: validation failed: [unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" inversion "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize"": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]
Other details that may be helpful: All Rancher related images are in our private repository
Environment information
- Rancher version (
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): 2.2.0 - Installation option (single install/HA): single install
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Custom
- Machine type (cloud/VM/metal) and specifications (CPU/memory): VM and 4 Cores/8GB RAM
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.6", GitCommit:"ab91afd7062d4240e95e51ac00a18bd58fddd365", GitTreeState:"clean", BuildDate:"2019-02-26T12:59:46Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
- Docker version (use
docker version
):
Client:
Version: 18.06.0-ce
API version: 1.38
Go version: go1.10.3
Git commit: 0ffa825
Built: Wed Jul 18 19:08:18 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.0-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: 0ffa825
Built: Wed Jul 18 19:10:42 2018
OS/Arch: linux/amd64
Experimental: false
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (2 by maintainers)
Commits related to this issue
- Add prometheus operator-init with CRDs configs Related Issues: https://github.com/rancher/rancher/issues/19362 — committed to guangbochen/system-charts by guangbochen 5 years ago
- Add prometheus operator-init with CRDs configs Related Issues: https://github.com/rancher/rancher/issues/19362 — committed to guangbochen/system-charts by guangbochen 5 years ago
- Add prometheus operator-init with CRDs configs Related Issues: https://github.com/rancher/rancher/issues/19362 — committed to guangbochen/system-charts by guangbochen 5 years ago
Looks like I’ve got the same issue with Rancher
v2.2.4
as soon as I’ve set"enableClusterMonitoring":true
in the cluster config. Seems like some race as if it tries to deploy the monitoring too soon right after (or during?) the cluster deployment. (Although, it works fine when I am enabling it manually in Web UI.) I have an automated Rancher deployment by Terraform.Full log from the Rancher server: https://gist.github.com/arno01/ed647eb5e84863f1d9c2cfe0e99cbdf7 Disabling and enabling the monitoring back at the
https://rancher/c/<cluster-id>/monitoring/cluster-setting
makes it work.You’re right, the cluster-agent pod could not resolve the rancher server hostname. I re-created the cluster after fixing the network. Apologies, monitoring is successfully installed now, very nice.
It looks like the Prometheus crd is not installed successfully. I think your cluster agent is suffering some network issue. A workaround should be delete the pod manually and it will recreate.
Is the cluster-agent version correct? Can you look at the log in cluster-agent pod under System project ?