rancher: [BUG] High memory usage on v2.7.5
Rancher Server Setup
- Rancher version: v2.7.5
- Installation option (Docker install/Helm Chart): Helm Chart
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): v1.23.17-eks-a5565ad
- Proxy/Cert Details:
Describe the bug
After upgrading Rancher from v2.6.13 to v2.7.5, we didn’t face any problems right after the upgrade, but within 7 days we had to switch our main node group from t3.large (8 GB of memory per node) to t3.xlarge (16 GB of memory per node) otherwise we were not able to get healthy rancher pods (all crashing in a variety of OOMKilled, Evicted, ContainerStatusUnknown)
Current situation with healthy pods:
$ kubectl top pod -n cattle-system
NAME CPU(cores) MEMORY(bytes)
eks-config-operator-57f94d69dd-gsdf8 2m 176Mi
rancher-c875bc68b-sftq2 179m 5505Mi
rancher-c875bc68b-wkrjl 659m 8383Mi
rancher-c875bc68b-xspj4 232m 6442Mi
rancher-webhook-648db6b695-j7ptw 44m 1287Mi
Rancher manage ~20 clusters, most of them running on v1.27.3-eks-a5565ad
, a single one on an old v1.21.14-eks-a5565ad
.
To Reproduce
Result
Expected Result
Screenshots
Additional context
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (11 by maintainers)
Hey @gionn thanks!
I think it is fair to close this issue, as the main reported symptoms are addressed. We are still researching ways to embed a more effective auto-cleanup, and that might include something functionally similar to your script, in one of the next versions going forward.
We will reserve the right to poke you in future when we have something to test - if you are OK with that!
And of course, this issue can be re-opened (or a new one can be opened) if symptoms are back.
Have a great day!
Give it some time to let the cleanup kick in
Yes. 0.7.1-rc.1 contains a mechanism which does the equivalent of the cleanup script, directly integrated in Fleet.
I will be posting test instructions later this morning, thanks for your openness to try out the RC!
TokenRequest API warnings seem to depend on a feature in beta activated on EKS clusters https://github.com/kubernetes/kubernetes/pull/117591
This will be default on 1.28, and I expect Rancher to be updated by that time (versions later than 1.26 are not officially supported https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-7-5/)
I think that is a red herring (annoying but not harmful).
High number of fleet secrets sounds like an instance of https://github.com/rancher/fleet/issues/1651, can you please see if the cleanup script posted there helps?
https://github.com/rancher/fleet/issues/1651#issuecomment-1640322635