amazon-vpc-cni-k8s: Error: NetworkPluginNotReady. cni config uninitialized
What happened:
Error:
Ready False Wed, 04 Nov 2020 10:56:25 +0000 Wed, 04 Nov 2020 10:48:23 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Yesterday evening I set ASG to zero. This morning I set ASG to 4.
kubectl get nodes
reports nodes as NotReady
kubectl describe node REDACTED
Name: REDACTED
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.large
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=REDACTED
failure-domain.beta.kubernetes.io/zone=REDACTED
kubernetes.io/arch=amd64
kubernetes.io/hostname=REDACTED
kubernetes.io/os=linux
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 04 Nov 2020 10:48:23 +0000
Taints: node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 04 Nov 2020 10:56:25 +0000 Wed, 04 Nov 2020 10:48:23 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 04 Nov 2020 10:56:25 +0000 Wed, 04 Nov 2020 10:48:23 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 04 Nov 2020 10:56:25 +0000 Wed, 04 Nov 2020 10:48:23 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 04 Nov 2020 10:56:25 +0000 Wed, 04 Nov 2020 10:48:23 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
InternalIP: REDACTED
ExternalIP: REDACTED
Hostname: REDACTED.compute.internal
InternalDNS: REDACTED.compute.internal
ExternalDNS: REDACTED.compute.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8063660Ki
pods: 35
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 1930m
ephemeral-storage: 18242267924
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7305900Ki
pods: 35
System Info:
Machine ID: REDACTED
System UUID: REDACTED
Boot ID: REDACTED
Kernel Version: 4.14.198-152.320.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.6
Kubelet Version: v1.15.11-eks-bf8eea
Kube-Proxy Version: v1.15.11-eks-bf8eea
ProviderID: aws:///REDACTED/i-REDACTED
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 8m48s kubelet, REDACTED.compute.internal Starting kubelet.
Normal NodeHasSufficientMemory 8m48s (x2 over 8m48s) kubelet, REDACTED.compute.internal Node REDACTED.compute.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 8m48s (x2 over 8m48s) kubelet, REDACTED.compute.internal Node REDACTED.compute.internal status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 8m48s (x2 over 8m48s) kubelet, REDACTED.compute.internal Node REDACTED.compute.internal status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 8m48s kubelet, REDACTED.compute.internal Updated Node Allocatable limit across pods
CNI was running: amazon-k8s-cni:v1.6.3
After seeing this error I upgraded it to: amazon-k8s-cni-init:v1.7.5 amazon-k8s-cni:v1.7.5
CNI Log attached. eks_i-REDACTED_2020-11-04_1111-UTC_0.6.2_REDACTED.zip
What you expected to happen: Nodes to join EKS cluster
How to reproduce it (as minimally and precisely as possible): That’s difficult to answer.
Anything else we need to know?:
- This is happening on
amazon-eks-node-1.15-v20201007 ami-0af730da10ac8b0b7
andamazon-eks-node-1.15-v20200814 ami-04cc6ec46d6dbc4fa
- Yesterday I installed Kubeflow on this cluster. Not that it matter as I installed kubeflow on another cluster as well and it’s fine.
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-065dce", GitCommit:"065dcecfcd2a91bd68a17ee0b5e895088430bd05", GitTreeState:"clean", BuildDate:"2020-07-16T01:44:47Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
- CNI Version
amazon-k8s-cni:v1.6.3
amazon-k8s-cni-init:v1.7.5
amazon-k8s-cni:v1.7.5
- OS (e.g:
cat /etc/os-release
):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
- Kernel (e.g.
uname -a
):Linux REDACTED.compute.internal 4.14.198-152.320.amzn2.x86_64 #1 SMP Wed Sep 23 23:57:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 30 (9 by maintainers)
In my case the problem was that
AmazonEKS_CNI_Policy
wasn’t attached by eksctl when I created the nodegroup.Hi there,
We have the same issue with our brand new Private EKS cluster (v 1.18) A node does not come in the Ready state due to
runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialize
aws-node pod is in Running state, but constantly gets restarted eah 2-3 minutes with the following errors
Successfully assigned kube-system/aws-node-85279 to ip-10-98-77-41.ec2.internal Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1" Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1" Created container aws-vpc-cni-init Started container aws-vpc-cni-init Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1" Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1" Created container aws-node Started container aws-node Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:30.590Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:40.602Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:50.591Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:30:00.597Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Container logs for amazon-k8s-cni
{"level":"info","ts":"2020-11-21T10:29:25.981Z","caller":"entrypoint.sh","msg":"Install CNI binary.."} {"level":"info","ts":"2020-11-21T10:29:25.998Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "} {"level":"info","ts":"2020-11-21T10:29:26.000Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
Kubelet logs show that:
Nov 21 10:34:38 ip-10-98-77-41.ec2.internal kubelet[3820]: W1121 10:34:38.585425 3820 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d Nov 21 10:34:40 ip-10-98-77-41.ec2.internal kubelet[3820]: E1121 10:34:40.142766 3820 kubelet.go:2195] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Here is the log collection: eks_i-0b47c308f2650abb6_2020-11-21_1032-UTC_0.6.2.tar.gz
I’ve found that if I manually create the file /etc/cni/net.d/10-aws.conflist with the following config:
{ "cniVersion": "0.3.1", "name": "aws-cni", "plugins": [ { "name": "aws-cni", "type": "aws-cni", "vethPrefix": "eni", "mtu": "9001", "pluginLogFile": "/var/log/aws-routed-eni/plugin.log", "pluginLogLevel": "Debug" }, { "type": "portmap", "capabilities": {"portMappings": true}, "snat": true } ] }
The node immediately goes UP.What’s the reason that this file gets not created automatically?
First I was thinking that it relates to the Custom CNI settings, but now I’ve created the new cluster with just three subnets and have done nothing related to Custom CNI networking (no changes to aws-node DaemonSet)
If you’re using VPC endpoints with nodes in private subnets, make sure the endpoint security groups are set up correctly
I encountered this issue with EKS.
TL;DR It happened since the subnet in which the nodegroup was running didn’t have any free IP’s left to assign to nodes/pods
The story: The error I was getting on the node was:
runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I noticed that the aws-node pod was in backoff mode looking into it’s logs, I noticed that there was an issue with assigning IP to the aws-node pod.
I checked my subnet and saw that kubeneretes managed to use up 255 ip addresses in the subnet, while running about 20 or so nodes. Read here if you want to know why that happens: https://medium.com/better-programming/amazon-eks-is-eating-my-ips-e18ea057e045
I increased the size of my subnets and that solved the issue. Also you can set
kubectl -n kube-system set env daemonset aws-node WARM_IP_TARGET=2
to immediately see a relief in the number of avail IP’sRecreate the service account to attach AmazonEKS_CNI_Policy to proper role.
In case this helps anyone, we had similar launching new clusters when aws-node began using a new cni v1.7.5 (some time late last week). We use our own pod security policies and it seems 1.7.5 requires NET_ADMIN capabilities. We didn’t need this with cni 1.6.3.
you made my day
Thank you. I’ll check.
Adam
I am also experiencing this issue without kubeflow. Notably, this happening on new NodeGroups in a cluster, and when trying to create a new cluster entirely via
eksctl
. Also, while I found the same log message in our cluster logs,network plugin is not ready: cni config uninitialized
, I found another message which may provide more insight:network plugin is not ready: cni config uninitialized, CSINode is not yet initialized, missing node capacity for resources: ephemeral-storage
. This is interesting because the new nodegroups were created using the same configuration as all our other nodegroups. I have also opened a support ticket with AWS.