amazon-vpc-cni-k8s: Pod connectivity problem with ENIConfig/pod-specific subnet

Maybe related to https://github.com/aws/amazon-vpc-cni-k8s/issues/212? We have a VPC with a primary CIDR block in 10.0.0.0/8 space and a secondary CIDR block in 100.64.0.0/10 space. A single EKS worker node running CentOS 7 with primary IP address on a 10.x.x.x subnet and an ENIConfig annotation on the node for a 100.64.x.x subnet. Pods running on the 100.64.x.x can communicate with pods on the same node running on the primary IP (hostNetwork: true) but cannot communicate off-node (e.g. to the control plane either directly or using the kubernetes service ClusterIP address). I can kubectl exec into pods running on the 100.64.x.x subnet and all relevant route tables, NACLs and security groups are correctly configured.

$ kubectl --kubeconfig kubeconfig run -i --rm --tty debug --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr AE:94:60:6F:AA:2E  
          inet addr:100.64.x.x  Bcast:100.64.x.x  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:508 (508.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ # wget http://10.x.x.x:61678/v1/networkutils-env-settings
Connecting to 10.x.x.x:61678 (10.x.x.x:61678)
networkutils-env-set 100% |*************************************************************************************************************|   105  0:00:00 ETA
/ # wget https://10.y.y.y/
Connecting to 10.y.y.y (10.y.y.y:443)
^C

10.x.x.x is the node’s primary IP, 10.y.y.y is one of the EKS control plane ENIs.

This prevents critical components like kube-dns starting

$ kubectl --kubeconfig kubeconfig logs --namespace kube-system kube-dns-d87b74b4f-f5gff kubedns
...
I1102 20:51:07.938774       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E1102 20:51:07.940074       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://172.20.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.20.0.1:443: i/o timeout
...

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 8
  • Comments: 25 (19 by maintainers)

Most upvoted comments

@lutierigb Thanks for digging in to this. I added code to explicitly add the primary IP on secondary ENIs in https://github.com/aws/amazon-vpc-cni-k8s/pull/271 and have verified that this works on CentOS 7 (and Amazon Linux 2 so no regression).

@sdavids13 I have installed centOS7 on a t2.medium instance. I am able to manually do following with the secondary ENI

[root@ip-172-31-35-103 centos]# ip route add 172.31.100.1 dev eth1 table 2
[root@ip-172-31-35-103 centos]# ip route add default via 172.31.100.1 table 2

Sure, i will check with our Amazon Linux engineers on why ip route add default via 172.31.100.1 table 2 is not working with some centOS7 AMI

@liwenwu-amazon you forgot the “dev eth1” in the default route command that’s why you were able to add it. it was probably added via eth0.

I was working on a similar issue and it seems like it comes down to how the kernel handles that. if you would to add an IP address to eth1 it wouldn’t complain about it but the aws cni plugin doesn’t configure any ip addresses on the secondary interfaces.

CentOS 7 with kernel 3.10 will fail to add the default route via eth1 if eth1 doesn’t have an ip address in that range.

I updated to kernel 4.x and had no issues but I don’t think 4 is “officially” supported for CentOS 7.