amazon-vpc-cni-k8s: Error: NetworkPluginNotReady. cni config uninitialized

What happened:

Error:

Ready            False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Yesterday evening I set ASG to zero. This morning I set ASG to 4.

kubectl get nodes reports nodes as NotReady

kubectl describe node REDACTED
Name:               REDACTED
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=REDACTED
                    failure-domain.beta.kubernetes.io/zone=REDACTED
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=REDACTED
                    kubernetes.io/os=linux
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 04 Nov 2020 10:48:23 +0000
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 04 Nov 2020 10:56:25 +0000   Wed, 04 Nov 2020 10:48:23 +0000   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
  InternalIP:   REDACTED
  ExternalIP:   REDACTED
  Hostname:     REDACTED.compute.internal
  InternalDNS:  REDACTED.compute.internal
  ExternalDNS:  REDACTED.compute.amazonaws.com
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           20959212Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      8063660Ki
  pods:                        35
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           18242267924
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7305900Ki
  pods:                        35
System Info:
  Machine ID:                 REDACTED
  System UUID:                REDACTED
  Boot ID:                    REDACTED
  Kernel Version:             4.14.198-152.320.amzn2.x86_64
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.6
  Kubelet Version:            v1.15.11-eks-bf8eea
  Kube-Proxy Version:         v1.15.11-eks-bf8eea
ProviderID:                   aws:///REDACTED/i-REDACTED
Non-terminated Pods:          (0 in total)
  Namespace                   Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests  Limits
  --------                    --------  ------
  cpu                         0 (0%)    0 (0%)
  memory                      0 (0%)    0 (0%)
  ephemeral-storage           0 (0%)    0 (0%)
  hugepages-1Gi               0 (0%)    0 (0%)
  hugepages-2Mi               0 (0%)    0 (0%)
  attachable-volumes-aws-ebs  0         0
Events:
  Type    Reason                   Age                    From                                              Message
  ----    ------                   ----                   ----                                              -------
  Normal  Starting                 8m48s                  kubelet, REDACTED.compute.internal  Starting kubelet.
  Normal  NodeHasSufficientMemory  8m48s (x2 over 8m48s)  kubelet, REDACTED.compute.internal  Node REDACTED.compute.internal status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    8m48s (x2 over 8m48s)  kubelet, REDACTED.compute.internal  Node REDACTED.compute.internal status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     8m48s (x2 over 8m48s)  kubelet, REDACTED.compute.internal  Node REDACTED.compute.internal status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  8m48s                  kubelet, REDACTED.compute.internal  Updated Node Allocatable limit across pods

CNI was running: amazon-k8s-cni:v1.6.3 After seeing this error I upgraded it to: amazon-k8s-cni-init:v1.7.5 amazon-k8s-cni:v1.7.5

CNI Log attached. eks_i-REDACTED_2020-11-04_1111-UTC_0.6.2_REDACTED.zip

What you expected to happen: Nodes to join EKS cluster

How to reproduce it (as minimally and precisely as possible): That’s difficult to answer.

Anything else we need to know?:

  • This is happening on amazon-eks-node-1.15-v20201007 ami-0af730da10ac8b0b7 and amazon-eks-node-1.15-v20200814 ami-04cc6ec46d6dbc4fa
  • Yesterday I installed Kubeflow on this cluster. Not that it matter as I installed kubeflow on another cluster as well and it’s fine.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-065dce", GitCommit:"065dcecfcd2a91bd68a17ee0b5e895088430bd05", GitTreeState:"clean", BuildDate:"2020-07-16T01:44:47Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
  • CNI Version
amazon-k8s-cni:v1.6.3
amazon-k8s-cni-init:v1.7.5
amazon-k8s-cni:v1.7.5
  • OS (e.g: cat /etc/os-release):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
  • Kernel (e.g. uname -a): Linux REDACTED.compute.internal 4.14.198-152.320.amzn2.x86_64 #1 SMP Wed Sep 23 23:57:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 30 (9 by maintainers)

Most upvoted comments

In my case the problem was that AmazonEKS_CNI_Policy wasn’t attached by eksctl when I created the nodegroup.

Hi there,

We have the same issue with our brand new Private EKS cluster (v 1.18) A node does not come in the Ready state due to runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialize

aws-node pod is in Running state, but constantly gets restarted eah 2-3 minutes with the following errors Successfully assigned kube-system/aws-node-85279 to ip-10-98-77-41.ec2.internal Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1" Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1" Created container aws-vpc-cni-init Started container aws-vpc-cni-init Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1" Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1" Created container aws-node Started container aws-node Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:30.590Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:40.602Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:29:50.591Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"} Readiness probe failed: {"level":"info","ts":"2020-11-21T10:30:00.597Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}

Container logs for amazon-k8s-cni {"level":"info","ts":"2020-11-21T10:29:25.981Z","caller":"entrypoint.sh","msg":"Install CNI binary.."} {"level":"info","ts":"2020-11-21T10:29:25.998Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "} {"level":"info","ts":"2020-11-21T10:29:26.000Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}

Kubelet logs show that: Nov 21 10:34:38 ip-10-98-77-41.ec2.internal kubelet[3820]: W1121 10:34:38.585425 3820 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d Nov 21 10:34:40 ip-10-98-77-41.ec2.internal kubelet[3820]: E1121 10:34:40.142766 3820 kubelet.go:2195] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Here is the log collection: eks_i-0b47c308f2650abb6_2020-11-21_1032-UTC_0.6.2.tar.gz

I’ve found that if I manually create the file /etc/cni/net.d/10-aws.conflist with the following config: { "cniVersion": "0.3.1", "name": "aws-cni", "plugins": [ { "name": "aws-cni", "type": "aws-cni", "vethPrefix": "eni", "mtu": "9001", "pluginLogFile": "/var/log/aws-routed-eni/plugin.log", "pluginLogLevel": "Debug" }, { "type": "portmap", "capabilities": {"portMappings": true}, "snat": true } ] } The node immediately goes UP.

What’s the reason that this file gets not created automatically?

  • I don’t see any error in the node cloud-init-output.log
  • NodeGroup role has AmazonEKS_CNI_Policy policy.

First I was thinking that it relates to the Custom CNI settings, but now I’ve created the new cluster with just three subnets and have done nothing related to Custom CNI networking (no changes to aws-node DaemonSet)

If you’re using VPC endpoints with nodes in private subnets, make sure the endpoint security groups are set up correctly

I encountered this issue with EKS.

TL;DR It happened since the subnet in which the nodegroup was running didn’t have any free IP’s left to assign to nodes/pods

The story: The error I was getting on the node was: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

I noticed that the aws-node pod was in backoff mode looking into it’s logs, I noticed that there was an issue with assigning IP to the aws-node pod.

I checked my subnet and saw that kubeneretes managed to use up 255 ip addresses in the subnet, while running about 20 or so nodes. Read here if you want to know why that happens: https://medium.com/better-programming/amazon-eks-is-eating-my-ips-e18ea057e045

I increased the size of my subnets and that solved the issue. Also you can set kubectl -n kube-system set env daemonset aws-node WARM_IP_TARGET=2 to immediately see a relief in the number of avail IP’s

In my case the problem was that AmazonEKS_CNI_Policy wasn’t attached by eksctl when I created the nodegroup.

Recreate the service account to attach AmazonEKS_CNI_Policy to proper role.

eksctl delete iamserviceaccount  --name aws-node   --namespace kube-system   --cluster $ClusterName 
eksctl create iamserviceaccount \
  --name aws-node \
  --namespace kube-system \
  --cluster $ClusterName \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \
  --approve \
  --override-existing-serviceaccounts

In case this helps anyone, we had similar launching new clusters when aws-node began using a new cni v1.7.5 (some time late last week). We use our own pod security policies and it seems 1.7.5 requires NET_ADMIN capabilities. We didn’t need this with cni 1.6.3.

In my case the problem was that AmazonEKS_CNI_Policy wasn’t attached by eksctl when I created the nodegroup.

you made my day

Thank you. I’ll check.

Adam

I am also experiencing this issue without kubeflow. Notably, this happening on new NodeGroups in a cluster, and when trying to create a new cluster entirely via eksctl. Also, while I found the same log message in our cluster logs, network plugin is not ready: cni config uninitialized, I found another message which may provide more insight: network plugin is not ready: cni config uninitialized, CSINode is not yet initialized, missing node capacity for resources: ephemeral-storage. This is interesting because the new nodegroups were created using the same configuration as all our other nodegroups. I have also opened a support ticket with AWS.