cilium: Extremely slow agent startup

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

  1. Rolling upgrade to v1.12.9
  2. each cilium agent pod is not ready for 2-3m

Cilium Version

1.12.9 e0bb30a 2023-04-17T23:54:19+02:00 go version go1.18.10 linux/amd64

Kernel Version

Linux ip-10-200-14-243 5.15.0-1033-aws #37~20.04.1-Ubuntu SMP Fri Mar 17 11:39:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

v1.22.17-eks-48e63af

Sysdump

No response

Relevant log output

No response

Anything else?

When startup is slow, it correlates with cluster size (some clusters don’t have this issue)

For the time until cilium starts, all I get is:

# cilium status
Get "http:///var/run/cilium/cilium.sock/v1/healthz": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?

I do see a bunch of clang and tc commands running in the background during this time.

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

@dctrwatson Could you give us an indication of the cluster size? How many pods do you have?

The cluster where that pprof was taken has: ~9k endpoints ~7k identities ~2k network policies ~7k services ~10k pods ~200 nodes