rook: Ceph "Cluster utilization" chart no longer working on dashboard after v18.2.0 upgrade
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior: When navigating to the homepage/dashboard in the Ceph Dashboard UI, the “Cluster utilization” chart is not longer functional. Charts consistently act as if they have no data.
Expected behavior: Dashboard chart loads and displays current values based on platform metrics.
How to reproduce it (minimal and precise):
While on the latest version of the Rook operator (deployed via helm), upgrade Ceph from v17 --> v18.2.0. No additional settings changes made, except image update.
File(s) to submit:
Dashboard screenshot:
- Cluster CR (custom resource), typically called
cluster.yaml, if necessary
spec:
cephVersion:
allowUnsupported: false
image: quay.io/ceph/ceph:v18.2.0
cleanupPolicy:
allowUninstallWithVolumes: false
confirmation: ''
sanitizeDisks:
dataSource: zero
iteration: 1
method: quick
continueUpgradeAfterChecksEvenIfNotHealthy: false
crashCollector:
disable: false
dashboard:
enabled: true
port: 8443
ssl: false
dataDirHostPath: /var/lib/rook
disruptionManagement:
managePodBudgets: true
osdMaintenanceTimeout: 30
pgHealthCheckTimeout: 0
external: {}
healthCheck:
daemonHealth:
mon:
disabled: false
interval: 45s
osd:
disabled: false
interval: 60s
status:
disabled: false
interval: 60s
livenessProbe:
mgr:
disabled: false
mon:
disabled: false
osd:
disabled: false
startupProbe:
mgr:
disabled: false
mon:
disabled: false
osd:
disabled: false
logCollector:
enabled: true
maxLogSize: 500M
periodicity: daily
mgr:
allowMultiplePerNode: false
count: 2
modules:
- enabled: true
name: pg_autoscaler
mon:
allowMultiplePerNode: false
count: 3
monitoring:
enabled: false
network:
connections:
compression:
enabled: false
encryption:
enabled: false
priorityClassNames:
mgr: system-cluster-critical
mon: system-node-critical
osd: system-node-critical
removeOSDsIfOutAndSafeToRemove: false
security:
kms: {}
skipUpgradeChecks: false
storage:
onlyApplyOSDPlacement: false
useAllDevices: true
useAllNodes: true
waitTimeoutForHealthyOSDInMinutes: 10
Logs to submit: No known evidence of error in operator or manager logs.
Cluster Status to submit: n/a
Environment:
- OS (e.g. from /etc/os-release): Ubuntu 22.04
- Kernel (e.g.
uname -a): Linux rke2-ceph-0 5.15.0-1041-kvm #46-Ubuntu SMP Fri Aug 25 07:39:11 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux - Cloud provider or hardware configuration: Proxmox VMs
- Rook version (use
rook versioninside of a Rook Pod): v1.12.3 - Storage backend version (e.g. for ceph do
ceph -v): v18.2.0 - Kubernetes version (use
kubectl version): v1.26.7 - Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): rke2
- Storage backend status (e.g. for Ceph use
ceph healthin the Rook Ceph toolbox):
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 15 (11 by maintainers)
Commits related to this issue
- mgr: adding support for prometheus endpoint configuration Using the new configuration parameters prometheusEndpoint and prometheusEndpointSSLVerify users can configure dashboard to point to their pro... — committed to rkachach/rook by rkachach 9 months ago
- mgr: adding support for prometheus endpoint configuration Using the new configuration parameters prometheusEndpoint and prometheusEndpointSSLVerify users can configure dashboard to point to their pro... — committed to rkachach/rook by rkachach 9 months ago
- mgr: adding support for prometheus endpoint configuration Using the new configuration parameters prometheusEndpoint and prometheusEndpointSSLVerify users can configure dashboard to point to their pro... — committed to rkachach/rook by rkachach 9 months ago
- mgr: adding support for prometheus endpoint configuration Using the new configuration parameters prometheusEndpoint and prometheusEndpointSSLVerify users can configure dashboard to point to their pro... — committed to rkachach/rook by rkachach 9 months ago
- mgr: adding support for prometheus endpoint configuration Using the new configuration parameters prometheusEndpoint and prometheusEndpointSSLVerify users can configure dashboard to point to their pro... — committed to rkachach/rook by rkachach 9 months ago
- mgr: adding support for prometheus endpoint configuration Using the new configuration parameters prometheusEndpoint and prometheusEndpointSSLVerify users can configure dashboard to point to their pro... — committed to rkachach/rook by rkachach 9 months ago
- mgr: adding support for prometheus endpoint configuration Using the new configuration parameters prometheusEndpoint and prometheusEndpointSSLVerify users can configure dashboard to point to their pro... — committed to rook/rook by rkachach 9 months ago
@nizamial09 Yes, Rook can set that value for the dashboard. It would be ideal if the operator could automatically detect this from a service with well-known prometheus labels. If that’s not possible, I’m thinking we would need a setting in the CephCluster CR. Even if we could detect the endpoint automatically in some scenarios, this setting could allow overriding the endpoint in case auto detection is not working correctly. For example, the setting could be
dashboard.prometheusEndpoint. This setting should also be mentioned inDocumentation/ceph-monitoring.mdso users enabling prometheus can find it.@rkachach Could you look into this?
Thank you @nizamial09 seems we were already collecting the metrics but had to configure the dashboard to connect to the prometheus api. Thank you!
Okay, then probably this should be a bug in the dashboard. We’ll need to test this with the rook orch to see what went wrong there. While we do that, you can still use the old dashboard as your default. You can follow this doc which explains how to switch to your old dashboard, there is a
Notesection which says that. Or you can simply issueceph dashboard feature disable dashboardfrom the toolbox to do that as well.The landing page is still new and we are still improving it. Hopefully we’ll fix all these bugs and release a stable version of it soon.