cilium: Extremely slow agent startup
Is there an existing issue for this?
- I have searched the existing issues
What happened?
- Rolling upgrade to v1.12.9
- each cilium agent pod is not ready for 2-3m
Cilium Version
1.12.9 e0bb30a 2023-04-17T23:54:19+02:00 go version go1.18.10 linux/amd64
Kernel Version
Linux ip-10-200-14-243 5.15.0-1033-aws #37~20.04.1-Ubuntu SMP Fri Mar 17 11:39:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
v1.22.17-eks-48e63af
Sysdump
No response
Relevant log output
No response
Anything else?
When startup is slow, it correlates with cluster size (some clusters don’t have this issue)
For the time until cilium starts, all I get is:
# cilium status
Get "http:///var/run/cilium/cilium.sock/v1/healthz": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
I do see a bunch of clang and tc commands running in the background during this time.
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (15 by maintainers)
Commits related to this issue
- ipcache: switch named ports to reference counting This commit introduces reference counting for named ports. Using the reference counting, we know when to add or remove named ports in our bookkeeping... — committed to bimmlerd/cilium by bimmlerd a year ago
- ipcache: switch named ports to reference counting This commit introduces reference counting for named ports. Using the reference counting, we know when to add or remove named ports in our bookkeeping... — committed to cilium/cilium by bimmlerd a year ago
- ipcache: switch named ports to reference counting [ upstream commit 33079de7fb6292efd4b837de2f696ee2edaeb8f4 ] This commit introduces reference counting for named ports. Using the reference counting... — committed to bimmlerd/cilium by bimmlerd a year ago
- ipcache: switch named ports to reference counting [ upstream commit 33079de7fb6292efd4b837de2f696ee2edaeb8f4 ] [ backporter's notes: We don't have the luxury of generics here, hence instead of using... — committed to bimmlerd/cilium by bimmlerd a year ago
- ipcache: switch named ports to reference counting [ upstream commit 33079de7fb6292efd4b837de2f696ee2edaeb8f4 ] This commit introduces reference counting for named ports. Using the reference counting... — committed to cilium/cilium by bimmlerd a year ago
- ipcache: switch named ports to reference counting [ upstream commit 33079de7fb6292efd4b837de2f696ee2edaeb8f4 ] [ backporter's notes: We don't have the luxury of generics here, hence instead of using... — committed to cilium/cilium by bimmlerd a year ago
The cluster where that pprof was taken has: ~9k endpoints ~7k identities ~2k network policies ~7k services ~10k pods ~200 nodes