amazon-vpc-cni-k8s: Pod connectivity problem with ENIConfig/pod-specific subnet
Maybe related to https://github.com/aws/amazon-vpc-cni-k8s/issues/212?
We have a VPC with a primary CIDR block in 10.0.0.0/8 space and a secondary CIDR block in 100.64.0.0/10 space.
A single EKS worker node running CentOS 7 with primary IP address on a 10.x.x.x subnet and an ENIConfig annotation on the node for a 100.64.x.x subnet.
Pods running on the 100.64.x.x can communicate with pods on the same node running on the primary IP (hostNetwork: true
) but cannot communicate off-node (e.g. to the control plane either directly or using the kubernetes service ClusterIP address).
I can kubectl exec
into pods running on the 100.64.x.x subnet and all relevant route tables, NACLs and security groups are correctly configured.
$ kubectl --kubeconfig kubeconfig run -i --rm --tty debug --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ # ifconfig
eth0 Link encap:Ethernet HWaddr AE:94:60:6F:AA:2E
inet addr:100.64.x.x Bcast:100.64.x.x Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:508 (508.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
/ # wget http://10.x.x.x:61678/v1/networkutils-env-settings
Connecting to 10.x.x.x:61678 (10.x.x.x:61678)
networkutils-env-set 100% |*************************************************************************************************************| 105 0:00:00 ETA
/ # wget https://10.y.y.y/
Connecting to 10.y.y.y (10.y.y.y:443)
^C
10.x.x.x
is the node’s primary IP, 10.y.y.y
is one of the EKS control plane ENIs.
This prevents critical components like kube-dns
starting
$ kubectl --kubeconfig kubeconfig logs --namespace kube-system kube-dns-d87b74b4f-f5gff kubedns
...
I1102 20:51:07.938774 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E1102 20:51:07.940074 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://172.20.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.20.0.1:443: i/o timeout
...
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 8
- Comments: 25 (19 by maintainers)
@lutierigb Thanks for digging in to this. I added code to explicitly add the primary IP on secondary ENIs in https://github.com/aws/amazon-vpc-cni-k8s/pull/271 and have verified that this works on CentOS 7 (and Amazon Linux 2 so no regression).
@liwenwu-amazon you forgot the “dev eth1” in the default route command that’s why you were able to add it. it was probably added via eth0.
I was working on a similar issue and it seems like it comes down to how the kernel handles that. if you would to add an IP address to eth1 it wouldn’t complain about it but the aws cni plugin doesn’t configure any ip addresses on the secondary interfaces.
CentOS 7 with kernel 3.10 will fail to add the default route via eth1 if eth1 doesn’t have an ip address in that range.
I updated to kernel 4.x and had no issues but I don’t think 4 is “officially” supported for CentOS 7.