rancher: Listing many large resources can be slow on the ember UI

What kind of request is this (question/bug/enhancement/feature request): Bug/ possibly enhancement

Steps to reproduce (least amount of steps as possible): Deploy 30-50 helm releases to one project. And a few revisions for each. With every release or iteration , rancher becomes slower and slower and at some point I am no longer able to access cluster via web UI at all, it just shows loading icon and kicks you back out to cluster list page

Result:

Other details that may be helpful: Helm 3 stores state in kubernetes secret. Looks like very large secrets, looks like rancher is trying to pull on all kubernetes secrets when I try to access UI.

It should be better to exclude helm release secret types, or at least not to include soft of secret and type of helm3 secrets in cluster pollings.

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.3.5
  • Installation option (single install/HA): HA Rancher is deployed on bare metal k8s cluster and another deployment is ingke

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported):
  • Machine type (cloud/VM/metal) and specifications (CPU/memory):
  • Kubernetes version (use kubectl version):
(paste the output here)
  • Docker version (use docker version):
(paste the output here)

gz#11894

gz#11629

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 15 (4 by maintainers)

Most upvoted comments

“api response times are reasonable” - i’m not sure that’s a true statement from a customer/user point of view. I see the improvement that has been made, but I don’t understand how anything taking 7 seconds is considered “reasonable”. It seems like there needs to be a fundamental logic change in how those resources are delivered.

Testing this will require ensuring any page in the ember UI that requires listing resources is functioning properly. Additionally we should generate ~200 secrets and 200 configmaps with ~125KB of data each to reproduce the excessive response times. Before these fixes are applied listing these resources should respond in ~30 - 40s. After these fixes are applied I would expect to see response times closes to 8 - 10 seconds.

cc: @sgapanovich

The proposed solution would be also useful for ConfigMaps.

We had the same issue. This is what happened:

  • Installed Rancher 2.5.5 which tried to install rancher-operator and fleet
  • rancher-operator and fleet were not starting because of issues described in https://github.com/rancher/rancher/issues/23077
  • Rancher was constantly trying to redeploy them and was creating secrets with names like sh.helm.release.v1.fleet.vXYZ in the rancher cluster.
  • Rancher /secrets request is probably connected with the secrets in the local cluster even if the request is for downstream cluster
  • We deleted all the sh.helm.release.v1.fleet.* and sh.helm.release.v1.rancher-operator.* secrets

-> calls to /secrets went from ~20sec to <1sec

  • don’t forget to either uninstall rancher-operator and fleet or fix them 😃

Rancher 2.5.5. We are experiencing the same issue on our production cluster. The “secrets?limit=-1&sort=name” call takes >25s to finish. Our cluster is not large - just several projects with low amount of kubernetes objects inside. Our users are forced to use the Cluster Explorer instead of the Rancher UI, since it is very slow. Any ideas ?

This issue makes UI almost unusable in any cluster with large amount of secrets. It doesn’t have to be helm releases, any significant amount of large secrets makes the UI unresponsive, because Rancher tries to get all secrets every time you select a Project. Worse, if there are enough pending requests(just 3-4 people clicking is enough in my case), the pod starts to fail the probes, dies and starts a long re-election during which the Rancher is basically completely unavailable.

secrets