kubernetes: Windows node does not route traffic to internal service IP
What happened?
When creating a service of type LoadBalancer with externalTrafficPolicy set to Local, kube-proxy on Windows does not seem to create a LoadBalancer via the HNS on the Windows node to route the traffic to the internal service endpoints, while kube-proxy on Linux does create the IPTables rules to route this traffic internally.
What did you expect to happen?
I excpected the Windows version of kube-proxy to create the same set of LoadBalancer rules as it does on Linux.
When externalTrafficPolicy is set to Cluster, the behavior on both Windows and Linux is the same: a rule is created to the internal service endpoints.
For example, the following service (external IP = 10.234.15.252):
Name: ingress-ext-ingress-nginx-controller
Namespace: ingress
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-ext
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.1.3
helm.sh/chart=ingress-nginx-4.0.19
Annotations: meta.helm.sh/release-name: ingress-ext
meta.helm.sh/release-namespace: ingress
service.beta.kubernetes.io/azure-load-balancer-internal: true
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-ext,app.kubernetes.io/name=ingress-nginx
Type: LoadBalancer
IP: 172.18.12.30
IP: 10.234.15.252
LoadBalancer Ingress: 10.234.15.252
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 32705/TCP
Endpoints: 10.234.8.118:80,10.234.8.73:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 31302/TCP
Endpoints: 10.234.8.118:443,10.234.8.73:443
Session Affinity: None
External Traffic Policy: Local
HealthCheck NodePort: 31191
Events: <none>
Creates the following IP table rules on Linux (filtered for IP 10.234.15.252):
Chain KUBE-FW-DTWZHMMUT5S663B6 (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-XLB-DTWZHMMUT5S663B6 all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http loadbalancer IP */
0 0 KUBE-MARK-DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http loadbalancer IP */
Chain KUBE-FW-KRIE4MXDNTHUWADS (1 references)
pkts bytes target prot opt in out source destination
11 572 KUBE-XLB-KRIE4MXDNTHUWADS all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https loadbalancer IP */
0 0 KUBE-MARK-DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https loadbalancer IP */
Chain KUBE-NODEPORTS (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ tcp -- * * 127.0.0.0/8 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */ tcp dpt:31302
0 0 KUBE-XLB-KRIE4MXDNTHUWADS tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */ tcp dpt:31302
0 0 KUBE-MARK-MASQ tcp -- * * 127.0.0.0/8 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */ tcp dpt:32705
0 0 KUBE-XLB-DTWZHMMUT5S663B6 tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */ tcp dpt:32705
Chain KUBE-SEP-7DLPOJWWRIRTVTIF (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.234.8.118 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */ tcp DNAT [unsupported revision]
Chain KUBE-SEP-B4FNPQLBB5TPYAOM (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.234.8.73 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */ tcp DNAT [unsupported revision]
Chain KUBE-SEP-M2UIJ3F6N64ZWELJ (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.234.8.118 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */
11 572 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */ tcp DNAT [unsupported revision]
Chain KUBE-SEP-WJOLKEI2OHE3E3QI (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.234.8.73 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */ tcp DNAT [unsupported revision]
Chain KUBE-SERVICES (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-KRIE4MXDNTHUWADS tcp -- * * 0.0.0.0/0 172.18.12.30 /* ingress/ingress-ext-ingress-nginx-controller:https cluster IP */ tcp dpt:443
11 572 KUBE-FW-KRIE4MXDNTHUWADS tcp -- * * 0.0.0.0/0 10.234.15.252 /* ingress/ingress-ext-ingress-nginx-controller:https loadbalancer IP */ tcp dpt:443
0 0 KUBE-SVC-DTWZHMMUT5S663B6 tcp -- * * 0.0.0.0/0 172.18.12.30 /* ingress/ingress-ext-ingress-nginx-controller:http cluster IP */ tcp dpt:80
0 0 KUBE-FW-DTWZHMMUT5S663B6 tcp -- * * 0.0.0.0/0 10.234.15.252 /* ingress/ingress-ext-ingress-nginx-controller:http loadbalancer IP */ tcp dpt:80
679 36864 KUBE-NODEPORTS all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
Chain KUBE-SVC-DTWZHMMUT5S663B6 (3 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ tcp -- * * !10.234.8.0/21 172.18.12.30 /* ingress/ingress-ext-ingress-nginx-controller:http cluster IP */ tcp dpt:80
0 0 KUBE-SEP-7DLPOJWWRIRTVTIF all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */ statistic mode random probability 0.50000000000
0 0 KUBE-SEP-B4FNPQLBB5TPYAOM all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:http */
Chain KUBE-SVC-KRIE4MXDNTHUWADS (3 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ tcp -- * * !10.234.8.0/21 172.18.12.30 /* ingress/ingress-ext-ingress-nginx-controller:https cluster IP */ tcp dpt:443
0 0 KUBE-SEP-M2UIJ3F6N64ZWELJ all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */ statistic mode random probability 0.50000000000
0 0 KUBE-SEP-WJOLKEI2OHE3E3QI all -- * * 0.0.0.0/0 0.0.0.0/0 /* ingress/ingress-ext-ingress-nginx-controller:https */
Chain KUBE-XLB-DTWZHMMUT5S663B6 (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-DTWZHMMUT5S663B6 all -- * * 10.234.8.0/21 0.0.0.0/0 /* Redirect pods trying to reach external loadbalancer VIP to clusterIP */
0 0 KUBE-MARK-MASQ all -- * * 0.0.0.0/0 0.0.0.0/0 /* masquerade LOCAL traffic for ingress/ingress-ext-ingress-nginx-controller:http LB IP */ ADDRTYPE match src-type LOCAL
0 0 KUBE-SVC-DTWZHMMUT5S663B6 all -- * * 0.0.0.0/0 0.0.0.0/0 /* route LOCAL traffic for ingress/ingress-ext-ingress-nginx-controller:http LB IP to service chain */ ADDRTYPE match src-type LOCAL
0 0 KUBE-SEP-7DLPOJWWRIRTVTIF all -- * * 0.0.0.0/0 0.0.0.0/0 /* Balancing rule 0 for ingress/ingress-ext-ingress-nginx-controller:http */
Chain KUBE-XLB-KRIE4MXDNTHUWADS (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-KRIE4MXDNTHUWADS all -- * * 10.234.8.0/21 0.0.0.0/0 /* Redirect pods trying to reach external loadbalancer VIP to clusterIP */
0 0 KUBE-MARK-MASQ all -- * * 0.0.0.0/0 0.0.0.0/0 /* masquerade LOCAL traffic for ingress/ingress-ext-ingress-nginx-controller:https LB IP */ ADDRTYPE match src-type LOCAL
0 0 KUBE-SVC-KRIE4MXDNTHUWADS all -- * * 0.0.0.0/0 0.0.0.0/0 /* route LOCAL traffic for ingress/ingress-ext-ingress-nginx-controller:https LB IP to service chain */ ADDRTYPE match src-type LOCAL
11 572 KUBE-SEP-M2UIJ3F6N64ZWELJ all -- * * 0.0.0.0/0 0.0.0.0/0 /* Balancing rule 0 for ingress/ingress-ext-ingress-nginx-controller:https */
Whereas on Windows there are no rules created for the external IP 10.234.15.252
If I’ll add an additional Service, same settings but with another external IP and externalTrafficPolicy: Cluster, rules for the external IP are created on both Linux and Windows.
Modified service with externalTrafficPolicy: Cluster (external IP = 10.234.15.250) :
Name: ingress-ext-ingress-nginx-controller-internalsvc
Namespace: ingress
Labels: <none>
Annotations: service.beta.kubernetes.io/azure-load-balancer-internal: true
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-ext,app.kubernetes.io/name=ingress-nginx
Type: LoadBalancer
IP: 172.18.7.20
IP: 10.234.15.250
LoadBalancer Ingress: 10.234.15.250
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 31585/TCP
Endpoints: 10.234.8.118:80,10.234.8.73:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 31653/TCP
Endpoints: 10.234.8.118:443,10.234.8.73:443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
IP Tables on Linux are similar to the one above, except for the IP offcourse.
And the rules on Windows, filtered on IP 10.234.15.250:
Networks:
Name ID
ext 6E546805-7041-41EF-9630-8BF3FF269497
nat D8074C97-BAF5-4776-8BAB-194003E59D62
azure DA1697CE-D2F8-4B83-B516-972CC1395537
Endpoints:
Name ID Virtual Network Name
Ethernet aebe2894-b5a1-4317-8888-faf9ac1d9374 azure
Ethernet c41f9ad9-c30f-468a-8a64-5ffb8025761b azure
Namespaces:
ID | Endpoint IDs
LoadBalancers:
ID | Virtual IPs | Direct IP IDs
66919ae6-6b07-492b-86a6-6a38f2ccdcb3 | | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
d6595df5-467c-4f26-929f-ee03db8ac379 | 172.18.0.183 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
61833462-43c0-4496-b653-a552006cef2a | 172.18.12.30 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
93d3fef9-9d8c-4237-8820-5875e7152158 | 172.18.12.30 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
956dab06-b5da-4031-b387-6b8b49c553ff | 172.18.15.230 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
24460bac-f31a-4557-8752-729aca23b913 | 10.234.15.250 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
5039a9d4-d561-4a8c-9090-f022484d8840 | 172.18.7.20 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
61ad8dff-8470-4354-9757-d1dcedd12480 | 172.18.7.20 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
fcd80846-e515-4e6b-a0ca-82df770cafe1 | | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
c86b7003-f7b2-45a0-b3cb-11243acaa243 | 10.234.15.250 | aebe2894-b5a1-4317-8888-faf9ac1d9374 c41f9ad9-c30f-468a-8a64-5ffb8025761b
As you can see, there are two rules with an empty IP address, but I do not know if that has anything to do with this issue. In the example above, the IPs are:
- 127.18.12.30 = Internal IP from service type LoadBalancer with
extTrafficPolicy: Local - 127.18.7.20 = Internal IP from service type LoadBalancer with
extTrafficPolicy: Cluster - 10.234.15.250 = External IP from service type LoadBalancer with
extTrafficPolicy: Cluster - 10.234.15.252 = External IP from service type LoadBalancer with
extTrafficPolicy: Localwhich is not beeing made
How can we reproduce it (as minimally and precisely as possible)?
Having a cluster with both Linux and Windows nodes, create a service of type LoadBalancer with externalTrafficPolicy set to Local.
Watch the rules for the external IP beeing listed on linux (sudo iptables -t nat -nvL), whereas they are not listed on Windows (hnsdiag list all)
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aa
bd17e", GitTreeState:"clean", BuildDate:"2020-09-09T11:26:42Z", GoVersion:"go1.15", Compiler:"gc", Platform:"windows/amd
64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"07959215dd83b4ae6317b33c824f845abd5
78642", GitTreeState:"clean", BuildDate:"2022-03-30T18:28:25Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/am
d64"}
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
$ uname -a
Linux aks-tmpnet1-16731720-vmss000001 5.4.0-1074-azure #77~18.04.1-Ubuntu SMP Wed Mar 30 15:36:02 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
BuildNumber Caption OSArchitecture Version
17763 Microsoft Windows Server 2019 Datacenter 64-bit 10.0.17763
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 33 (22 by maintainers)
(following up from the sig net meeting)
I said it should behave “as though the traffic goes to the external LB”, but it’s totally fine to actually send it to the external LB as well. I just meant “it can’t behave in a manner that is observably different from what you’d get if you had sent it to the external LB”. Meaning specifically, it has to get delivered to an endpoint, even if the traffic policy is
Localand there are no local endpoints.(FWIW I don’t think it was done for performance reasons on Linux. The issue is that in the iptables implementation, the rule that accepts inbound traffic addressed to the LB IP also catches outbound traffic addressed to the LB IP, so we need to do something to make pod-to-LB-IP not be subject to the
externalTrafficPolicy, and at that point, it was just as easy to “short-circuit” the traffic as it would have been to send it the long way.)Allright @jsturtevant, after some modifications to represent our configuration, I was able to reproduce the issue on a newly created cluster.
I was able to narrow the issue down to the following:
When creating a service, with only linux pods matching the selector, kubeproxy on Windows does not create HNS rules when externalTrafficPolicy is local, while kubeproxy on Linux does create IPTABLES rules, even though the only matching pods resides on Windows nodes.
Using https://gist.github.com/marcelvwe/a139684d60225406dc66ccde54cf19bb
On AKS with version 1.22.6
Services looking like:
The linux node does have IPTABLE rules for both the
whoami-linand thewhoami-winservice:While the Windows node does only have a HNS rule for the
whoami-winservice:So, in the HNS output, the IP
10.240.0.34is missing.This will cause the traffic originating from pods on the Windows node to flow to the azure LoadBalancer, instead of directly beeing sent to one of the endpoints on the Linux node (residing in the same subnet).
If I’ll modify the gist to
externalTrafficPolicy: Cluster, the rules are created both on Linux and on Windows for both services (iow, the behavior is the same on both OSes)