amazon-vpc-cni-k8s: aws-node pod does not start correctly the first time
What happened:
When a new node is started in a nodegroup the node takes a lot of time to be marked as Ready because aws-node (cni) pod does not start correctly the first time and has to undergo 1-2 restarts. Also the restarts are delayed because the initial delay for liveness probe is set to 60 sec. If we are increasing the failure threshold for liveness then the aws-node pod is not marked ready even after 7-8 minutes (may even be longer, it does not seem to run at all as readiness probes never succeed).
What you expected to happen: We expect the nodes to be ready in under a minute
How to reproduce it (as minimally and precisely as possible):
We are facing this on a newly created eks cluster so should be easily reproducable
Environment:
-
Kubernetes version (use
kubectl version
): Server Version: version.Info{Major:“1”, Minor:“21+”, GitVersion:“v1.21.2-eks-0389ca3”, GitCommit:“8a4e27b9d88142bbdd21b997b532eb6d493df6d2”, GitTreeState:“clean”, BuildDate:“2021-07-31T01:34:46Z”, GoVersion:“go1.16.5”, Compiler:“gc”, Platform:“linux/amd64”} -
CNI Version : v1.9.1-eksbuild.1
-
OS (e.g:
cat /etc/os-release
): Amazon Linux 2 -
Kernel (e.g.
uname -a
): 5.4.149-73.259.amzn2.x86_64
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 16 (9 by maintainers)
Commits related to this issue
- Apply kube-proxy daemonset by manifest not EKS addons The EKS addon for kube-proxy introduced regressions of #124 and #209. We will apply the recommended overrides from https://github.com/aws/contain... — committed to cookpad/terraform-aws-eks by ettiee 3 years ago
- Apply kube-proxy daemonset by manifest not EKS addons The EKS addon for kube-proxy introduced regressions of #124 and #209. We will apply the recommended overrides from https://github.com/aws/contain... — committed to cookpad/terraform-aws-eks by ettiee 3 years ago
Hi @jayanthvn is there any update on when
- --hostname-override=$(NODE_NAME)
will be included as default in kube-proxy managed add on?@ChrisRamsayITV - I will check with the team and get back to you next week.
I’d suggest reading https://medium.com/keikoproj/rapid-auto-scaling-on-eks-part-1-bb4de84fc599 - it might be that kube-proxy hasn’t started yet by the time the CNI tries to start, in which case it can’t connect to the control plane.
1.22 default kube-proxy manifest will have this change. Release calendar can be found here - https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html
Hi @jayanthvn - Is there any update on when this will be included as default? Thanks