rook: Enabling prometheus mgr module via CephCluster CR not being respected.
Ultimately I’m trying to scrape Ceph metrics using Prometheus NOT installed via the Prometheus Operator. It was working for me until the v1.11.3 upgrade which included this change disabling the Prometheus mgr module. After upgrading to v1.11.3 my mgr pods started refusing connections on port 9283.
Initially I tried setting spec.monitoring.enabled to true in the CephCluster CR. That led to the CephCluster failing reconciliation because the ServiceMonitor CRD and permissions didn’t exist.
Next I tried enabling the Prometheus mgr module via spec.mgr.modules in the CephCluster CR. The Rook-Ceph-Operator never recognizes this as a change and never updates the mgr deployments.
Next I used the rook-ceph-tools pod to manually enable the Prometheus mgr module using ceph mgr module enable prometheus. That fixes my issue for a while but at some point the operator or something else disables the module again.
If the intention is for the Prometheus mgr module to be governed by the spec.monitoring.enabled setting can we maybe add another sub-setting to enable or disable the use of service monitors?
-
Cluster CR (custom resource), typically called
cluster.yaml, if necessary cluster.txt -
Operator’s logs, if necessary operator-logs.txt
-
Output of krew commands, if necessary ceph-status.txt
-
OS (e.g. from /etc/os-release):
PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy -
Kernel (e.g.
uname -a):Linux hci-01 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux -
Cloud provider or hardware configuration:
Bare metal servers -
Rook version (use
rook versioninside of a Rook Pod):rook: v1.11.3 go: go1.19.7 -
Storage backend version (e.g. for ceph do
ceph -v):ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable) -
Kubernetes version (use
kubectl version):Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"darwin/amd64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:33:12Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"} -
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
kubeadm -
Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):HEALTH_OK
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (3 by maintainers)
This feels like a workable solution to me. And it makes a certain amount of sense to separate the rook-ceph metrics from the monitoring loadout the operator will deploy.
Ceph mgr module prometheus configured by cephclusters CRD. You can enable like this:
And edit
If you want set mgr/prometheus/server_port and scrape_interval, you can set value of port and interval.