amazon-vpc-cni-k8s: aws-node pod does not start correctly the first time

What happened:

When a new node is started in a nodegroup the node takes a lot of time to be marked as Ready because aws-node (cni) pod does not start correctly the first time and has to undergo 1-2 restarts. Also the restarts are delayed because the initial delay for liveness probe is set to 60 sec. If we are increasing the failure threshold for liveness then the aws-node pod is not marked ready even after 7-8 minutes (may even be longer, it does not seem to run at all as readiness probes never succeed).

What you expected to happen: We expect the nodes to be ready in under a minute

How to reproduce it (as minimally and precisely as possible):

We are facing this on a newly created eks cluster so should be easily reproducable

Environment:

  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:“1”, Minor:“21+”, GitVersion:“v1.21.2-eks-0389ca3”, GitCommit:“8a4e27b9d88142bbdd21b997b532eb6d493df6d2”, GitTreeState:“clean”, BuildDate:“2021-07-31T01:34:46Z”, GoVersion:“go1.16.5”, Compiler:“gc”, Platform:“linux/amd64”}

  • CNI Version : v1.9.1-eksbuild.1

  • OS (e.g: cat /etc/os-release): Amazon Linux 2

  • Kernel (e.g. uname -a): 5.4.149-73.259.amzn2.x86_64

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 16 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @jayanthvn is there any update on when - --hostname-override=$(NODE_NAME) will be included as default in kube-proxy managed add on?

@ChrisRamsayITV - I will check with the team and get back to you next week.

I’d suggest reading https://medium.com/keikoproj/rapid-auto-scaling-on-eks-part-1-bb4de84fc599 - it might be that kube-proxy hasn’t started yet by the time the CNI tries to start, in which case it can’t connect to the control plane.

1.22 default kube-proxy manifest will have this change. Release calendar can be found here - https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html

Hi @jayanthvn - Is there any update on when this will be included as default? Thanks