harvester: [BUG] The Prometheus monitoring chart become empty after staying on dashboard page for a period of time

Describe the bug

This issue originate and continue the tracking in #1531
After staying on the dashboard page for a period of time (e.g 20 minutes)

  • The Prometheus monitoring chart become empty image

  • Click the Reload can’t recover it back image

  • The only workaround is to refresh the page, but put it to idle for another period of time, the monitoring chart become empty again

To Reproduce Steps to reproduce the behavior:

  1. Prepare a one v1.0.1 harvester cluster
  2. Open the Dashboard page
  3. Create a virtual machine
  4. Put the VM metrics page to monitor the chart display status
  5. Put the dashboard page to monitoring the chart display status

Expected behavior

Prometheus monitoring chart on dashboard page should always keep display the change without becoming empty.

Support bundle

supportbundle_e54fcdb3-17f5-42ad-a748-3d51e3afc40a_2022-03-14T06-38-28Z.zip

Environment:

  • Harvester ISO version: v1.0.1
  • Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): 1 node harvester on local kvm

Additional context Add any other context about the problem here.

  • The Prometheus pod running as expected, the node-exporter pod did not restart

    rancher@harvester-node-2:~> sudo -i kubectl get pods -n cattle-monitoring-system
    NAME                                                     READY   STATUS    RESTARTS   AGE
    prometheus-rancher-monitoring-prometheus-0               3/3     Running   0          4h52m
    rancher-monitoring-grafana-d9c56d79b-kcjwh               3/3     Running   0          4h52m
    rancher-monitoring-kube-state-metrics-5bc8bb48bd-49l2q   1/1     Running   0          4h52m
    rancher-monitoring-operator-559767d69b-sq58h             1/1     Running   0          4h52m
    rancher-monitoring-prometheus-adapter-8846d4757-d65xh    1/1     Running   0          4h52m
    rancher-monitoring-prometheus-node-exporter-d7xsn        1/1     Running   0          4h52m
    rancher-monitoring-prometheus-node-exporter-lxrrv        1/1     Running   0          4h24m
    rancher-monitoring-prometheus-node-exporter-pspnn        1/1     Running   0          4h39m
    
  • Use the default monitoring setting

  • The VM metrics display correctly without empty

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

Verified on master-2a4f4de5-head (04/27). The reload issue have already fixed.

  • When Prometheus monitoring chart empty, Click Reload can recover monitoring chart back to display issue2150

Test Information

  • Test Environment: 3 nodes harvester on local kvm machines
  • Harvester version: master-2a4f4de5-head (04/27)

Verify Steps

  1. Prepare 3 nodes v1.0.1 harvester cluster
  2. Open the Dashboard page
  3. Create a virtual machine
  4. Put the VM metrics page to monitor the chart display status
  5. Put the dashboard page to monitoring the chart display status
  6. When dashboard monitoring chart display empty
  7. Click the Reload to recover

Or 5. Access Harvester explorer page https://<kube-vip>/dashboard/c/local 6. Access Workload -> Deployments 7. Change namespace to cattle-monitoring-system image 8. Scale-down cattle-monitoring-system/rancher-monitoring-grafana of deployment to 0 image image

  1. When dashboard monitoring chart display empty

  2. Scale-up cattle-monitoring-system/rancher-monitoring-grafana of deployment to 1 image

  3. Click the Reload to recover

Happend to meet it, debug info:

Click the Reload, the UI was stucking in Loading image

the Chrome debug showed: a couple of HTTP GET were done successfully, no failure, no on-going image

Comparing, when it works, the switch from “VM Metrics” to “Cluster Metrics” triggers those first bunch of HTTP GET It had more items. image

and then continuously QUERY image

Question: As backend nginx has no HTTP error, prometheus/grafana PODs were also in good state. When UI metrics was “Loading”, what was missing? which line of the UI code was suspending ? thanks.