cilium: pod-to-b-multi-node-nodeport connectivity test failing on EKS with 1.8.0-rc3
When following the EKS GSG instructions to validate Cilium 1.8.0-rc3 for #11903 (including the fix for #12078), the connectivity check for pod-to-b-multi-node-nodeport
is failing:
% kubectl get po
NAME READY STATUS RESTARTS AGE
echo-a-58dd59998d-fcfsc 1/1 Running 0 133m
echo-b-865969889d-7qgfg 1/1 Running 0 133m
echo-b-host-659c674bb6-tvzxm 1/1 Running 0 133m
host-to-b-multi-node-clusterip-6fb94d9df6-v25v4 1/1 Running 0 133m
host-to-b-multi-node-headless-7c4ff79cd-2dgct 1/1 Running 0 133m
pod-to-a-5c8dcf69f7-zq2zj 1/1 Running 0 133m
pod-to-a-allowed-cnp-75684d58cc-tb5jm 1/1 Running 0 133m
pod-to-a-external-1111-669ccfb85f-8l2p2 1/1 Running 0 133m
pod-to-a-l3-denied-cnp-7b8bfcb66c-qg2wc 1/1 Running 0 133m
pod-to-b-intra-node-74997967f8-c88x9 1/1 Running 0 133m
pod-to-b-intra-node-nodeport-775f967f47-t426f 1/1 Running 0 133m
pod-to-b-multi-node-clusterip-587678cbc4-xskt6 1/1 Running 0 133m
pod-to-b-multi-node-headless-574d9f5894-xd2jq 1/1 Running 0 133m
pod-to-b-multi-node-nodeport-7944d9f9fc-qpv5r 0/1 Running 0 133m
pod-to-external-fqdn-allow-google-cnp-6dd57bc859-bqhhq 1/1 Running 0 133m
% kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
echo-a ClusterIP 10.100.133.146 <none> 80/TCP 136m
echo-b NodePort 10.100.21.112 <none> 80:31313/TCP 136m
echo-b-headless ClusterIP None <none> 80/TCP 136m
echo-b-host-headless ClusterIP None <none> <none> 136m
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 147m
% kubectl get ep
NAME ENDPOINTS AGE
echo-a 192.168.108.20:80 136m
echo-b 192.168.98.169:80 136m
echo-b-headless 192.168.98.169:80 136m
echo-b-host-headless 192.168.16.155 136m
kubernetes 192.168.148.188:443,192.168.97.61:443 147m
Follow-up for #12078
/cc @brb
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (19 by maintainers)
Commits related to this issue
- iptables, loader: add rules for multi-node NodePort traffic on EKS Multi-node NodePort traffic for EKS needs a set of specific rules that are usually set by the aws daemonset: # sysctl -w net.ip... — committed to cilium/cilium by qmonnet 4 years ago
- iptables, loader: add rules for multi-node NodePort traffic on EKS Multi-node NodePort traffic for EKS needs a set of specific rules that are usually set by the aws daemonset: # sysctl -w net.ip... — committed to cilium/cilium by qmonnet 4 years ago
- iptables, loader: add rules to ensure symmetric routing for AWS ENI traffic Multi-node NodePort traffic with AWS ENI needs a set of specific rules that are usually set by the AWS DaemonSet: # sy... — committed to cilium/cilium by qmonnet 4 years ago
- iptables, loader: add rules to ensure symmetric routing for AWS ENI traffic Multi-node NodePort traffic with AWS ENI needs a set of specific rules that are usually set by the AWS DaemonSet: # sy... — committed to cilium/cilium by qmonnet 4 years ago
- iptables, loader: add rules to ensure symmetric routing for AWS ENI traffic [ upstream commit 132088c996a59e64d8f848c88f3c0c93a654290c ] Multi-node NodePort traffic with AWS ENI needs a set of speci... — committed to cilium/cilium by qmonnet 4 years ago
- iptables, loader: add rules to ensure symmetric routing for AWS ENI traffic [ upstream commit 132088c996a59e64d8f848c88f3c0c93a654290c ] Multi-node NodePort traffic with AWS ENI needs a set of speci... — committed to cilium/cilium by qmonnet 4 years ago
- iptables, loader: add rules to ensure symmetric routing for AWS ENI traffic [ upstream commit c7f9997d7001c8561583d374dcbd4d973bad6fac ] Multi-node NodePort traffic with AWS ENI needs a set of speci... — committed to cilium/cilium by qmonnet 4 years ago
- iptables, loader: add rules to ensure symmetric routing for AWS ENI traffic [ upstream commit c7f9997d7001c8561583d374dcbd4d973bad6fac ] Multi-node NodePort traffic with AWS ENI needs a set of speci... — committed to cilium/cilium by qmonnet 4 years ago
I could reproduce the issue by following the GSG for AWS-EKS. I could also apply the manual fix from Thomas, with an additional step for the return-path filter on
eth0
.The ENI CNI in the
aws-node
daemonset sets the following rules and configuration on the node:Since the GSG instructs to remove the daemonset before deploying Cilium and creating the node, this configuration is not used, and we have instead:
Cilium sets
net.ipv4.conf.all.rp_filter
at0
, but the maximum value inconf/{all,interface}/rp_filter
is used when doing source validation on an{interface}
, so in our caserp_filter
is in strict mode oneth0
. This prevents the packets received from the first node oneth0
to be (SNAT-ed and) forwarded to the pod. Instead they are dropped by the host and noSYN/ACK
is emitted back. Disablingrp_filter
or setting it to loose mode fixes it, but theSYN/ACK
s are not sent to the correct destination.This is due to the
ip rule
that is matched for those packets, it tells the host to do a FIB lookup in table 3 (associated to interface at index3
,eth1
in my case) and not in themain
table as should be the case. This is where we need marking the packets and looking at themain
table when the mark is found.I used the following commands to restore the rules and have
pod-to-b-multi-node-nodeport
getting ready:I’m working on a fix to have Cilium reproduce this configuration on AWS.
We likely missed this when validating the GSG for 1.7 because
pod-to-b-multi-node-nodeport
(orpod-to-b-intra-node-nodeport
failing with v1.7.5) did not exist at the time.