rook: Enabling prometheus mgr module via CephCluster CR not being respected.

Ultimately I’m trying to scrape Ceph metrics using Prometheus NOT installed via the Prometheus Operator. It was working for me until the v1.11.3 upgrade which included this change disabling the Prometheus mgr module. After upgrading to v1.11.3 my mgr pods started refusing connections on port 9283.

Initially I tried setting spec.monitoring.enabled to true in the CephCluster CR. That led to the CephCluster failing reconciliation because the ServiceMonitor CRD and permissions didn’t exist.

Next I tried enabling the Prometheus mgr module via spec.mgr.modules in the CephCluster CR. The Rook-Ceph-Operator never recognizes this as a change and never updates the mgr deployments.

Next I used the rook-ceph-tools pod to manually enable the Prometheus mgr module using ceph mgr module enable prometheus. That fixes my issue for a while but at some point the operator or something else disables the module again.

If the intention is for the Prometheus mgr module to be governed by the spec.monitoring.enabled setting can we maybe add another sub-setting to enable or disable the use of service monitors?

Cluster CR (custom resource), typically called cluster.yaml, if necessary cluster.txt
Operator’s logs, if necessary operator-logs.txt
Output of krew commands, if necessary ceph-status.txt
OS (e.g. from /etc/os-release): PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
Kernel (e.g. uname -a): Linux hci-01 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Cloud provider or hardware configuration: Bare metal servers
Rook version (use rook version inside of a Rook Pod): rook: v1.11.3 go: go1.19.7
Storage backend version (e.g. for ceph do ceph -v): ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"darwin/amd64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:33:12Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

About this issue

Original URL
State: closed
Created a year ago
Comments: 19 (3 by maintainers)

Most upvoted comments

It seems we really need two settings in the cluster CR. This proposal would allow the metrics to continue being enabled by default along with the ceph exporter when it is ready to be enabled. The behavior of the first setting remains unchanged for backward compatibility.
monitoring:
  # Enables the full feature with service monitor, alerts, etc. 
  # Default is false. 
  enabled: false 
  # If false, both the prometheus module and the ceph exporter daemon are enabled. 
  # If true, both the prometheus module and ceph exporter are disabled. 
  # Default is false. 
  metricsDisabled: false 
Trying to manage the prometheus module separately the mgr.modules wasn’t intuitive, especially once we introduce the question of what happens with the exporter.

This feels like a workable solution to me. And it makes a certain amount of sense to separate the rook-ceph metrics from the monitoring loadout the operator will deploy.

benlsheets on May 4, 2023

same here, i have 3 ceph clusters in my k8s cluster, I try to enable prometheus module by using toolbox pod, but when I activate all of them, there will always be one that doesn’t take effect. but I edit it on web dashboard, 3 ceph cluster3 are all works.

shell in toolbox
ceph mgr module enable prometheus
ceph config set mgr mgr/prometheus/server_addr 0.0.0.0
ceph config set mgr mgr/prometheus/server_port 9283
Is there a command to enable prometheus module as always-on?

Ceph mgr module prometheus configured by cephclusters CRD. You can enable like this:

[root@smd-node01-prediction-prod-sz deeproute]# kubectl get cephclusters.ceph.rook.io  -n rook-ceph
NAME        DATADIRHOSTPATH   MONCOUNT   AGE   PHASE   MESSAGE                        HEALTH      EXTERNAL
rook-ceph   /var/lib/rook     3          46h   Ready   Cluster created successfully   HEALTH_OK   
[root@smd-node01-prediction-prod-sz deeproute]# 
[root@smd-node01-prediction-prod-sz deeproute]# 
[root@smd-node01-prediction-prod-sz deeproute]# kubectl edit  cephclusters.ceph.rook.io  rook-ceph  -n rook-ceph

And edit

  monitoring:
    enabled: true

If you want set mgr/prometheus/server_port and scrape_interval, you can set value of port and interval.

thenamehasbeentake on Apr 19, 2023