kubernetes: externalTrafficPolicy: Local breaks internal reachability
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
This is discussed on kubernetes-users
When externalTrafficPolicy: Local is used, in GKE, external traffic coming in to the Load Balancer works correctly. But traffic that originates within the GKE cluster is broken, a TCP SYN is not responded to.
My use case: I run gitlab, gitlab-runner, docker-registry, wiki inside kubernetes.
But I also access the docker-registry outside kubernetes. So my hostname for it is ‘cr.COMPANY.com’. In DNS this resolves to the LB IP. Actually, all 3 services have DNS names that resolve to the same LB IP, which then goes to the nginx ingress. so wiki.COMPANY.com, git.COMPANY.com, wiki.COMPANY.com are generally addressed as LB-IP w/ HOSTNAME+SNI. Its ‘astonishing’ to not be able to access this DNS name when I’m ‘inside’ but can when I’m ‘outside’.
If I enable the IP transparency (because in my wiki I want the IP of the user who made the edit), then my gitlab-runner fails(since it cannot pull from the registry).
Tim Hockin has this to say:
On Friday, June 22, 2018 at 1:32:45 PM UTC-4, Tim Hockin wrote:
I reproduced this. Here's the explanation of why it is working as intended. We can discuss whether that intention is misguided or not.
When externalTrafficPolicy is "Cluster", each node acts as an LB gateway to the Service, regardless of where the backends might be running. In order to make this work, we must SNAT (which obscures the client's IP) because the traffic could cross nodes, and you can't have the final node responding directly to the client (bad 5-tuple). As we agree, this mode works but isn't what you want.
When externalTrafficPolicy is "Local", only nodes that actually have a backend for a given Service act as an LB gateway. This means we do not need to SNAT, thereby keeping the client IP. But what about nodes which do not have backends? They drop those packets. The GCE load-balancer has relatively long programming time, so we don't want to change the TargetPools every time a pod moves. The healthchecks are set up such that nodes that do not have a backend fail the HC - thus the LB should never route to them! Very clever.
Here's the rub - the way GCP sets up LBs is via a VIP which is a local IP to each VM. By default, access to the LB VIP from a node "behind" that VIP (all nodes in k8s) is serviced by that same VM, not by the actual LB. The assumption is that you are accessing yourself, why go through the network to do that?
Kubernetes makes an explicit provision for pods that access an LB VIP by treating them as if they accessed the internal service VIP (which is not guaranteed to stay node-local). We did not make a provision for a NODE to access the LB VIP in the same way. Maybe we could? I seem to recall an issue there, in how we distinguish traffic originating from "this VM" vs traffic we are gatewaying.
So there you see - it is doing what is intended, but maybe not what you want. Now - concvince me that the use-case of accessing an external LB VIP from a node in the cluster (not a pod - a node) is worth the extra complexity? One case I admit that falls in a crack is `hostNetwork` pods. They will fail.
and then:
On Friday, June 22, 2018 at 4:14:24 PM UTC-4, Tim Hockin wrote:
Well, the good news is that I think this is easy to fix.
For my test case, the service hashes to NWV5X2332I4OT4T3
iptables -t nat -I KUBE-XLB-NWV5X2332I4OT4T3 -m addrtype --src-type LOCAL -j KUBE-SVC-NWV5X2332I4OT4T3
iptables -t nat -I KUBE-XLB-NWV5X2332I4OT4T3 -m addrtype --src-type LOCAL -j KUBE-MARK-MASQ
Note these two are in inverse order because I am lazy and used -I. This will break the "only local" property for in-cluster accesses, but I think that is OK since it's explicitly not true for access from pods.
The kube-proxy change here should be easy enough, but the testing is a little involved.
Would you do me the good favor of opening a github bug and we can see if we can rally an implementor?
What you expected to happen:
My expectation is that it should not matter where my traffic originates, if it is to a globally-routeable public IP it should work.
How to reproduce it (as minimally and precisely as possible):
Configure a Load Balancer, an ingress, and 2 services. Have service 2 talk to service one using a DNS name that maps to the Load Balancer.
Anything else we need to know?:
Environment:
-
Kubernetes version (use
kubectl version
): Client Version: version.Info{Major:“1”, Minor:“10”, GitVersion:“v1.10.4”, GitCommit:“5ca598b4ba5abb89bb773071ce452e33fb66339d”, GitTreeState:“clean”, BuildDate:“2018-06-06T08:13:03Z”, GoVersion:“go1.9.3”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“10+”, GitVersion:“v1.10.2-gke.1”, GitCommit:“75d2af854b1df023c7ce10a8795b85d3dd1f8d37”, GitTreeState:“clean”, BuildDate:“2018-05-10T17:23:18Z”, GoVersion:“go1.9.3b4”, Compiler:“gc”, Platform:“linux/amd64”} -
Cloud provider or hardware configuration: Google GKE
-
OS (e.g. from /etc/os-release): BUILD_ID=10452.89.0 NAME=“Container-Optimized OS” KERNEL_COMMIT_ID=e2e439017d740b3fbe0f4f1a2bc63af84facf535 GOOGLE_CRASH_ID=Lakitu VERSION_ID=66 BUG_REPORT_URL=https://crbug.com/new PRETTY_NAME=“Container-Optimized OS from Google” VERSION=66 GOOGLE_METRICS_PRODUCT_ID=26 HOME_URL=“https://cloud.google.com/compute/docs/containers/vm-image/” ID=cos
-
Kernel (e.g.
uname -a
): Linux gke-k8s-default-pool-d0b73bf2-d3ft 4.14.22+ #1 SMP Thu May 10 17:54:42 PDT 2018 x86_64 Intel® Xeon® CPU @ 2.00GHz GenuineIntel GNU/Linux -
Install tools:
-
Others:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 12
- Comments: 38 (14 by maintainers)
Commits related to this issue
- Send local traffic to LoadBalancer Hello! This PR allows to send traffic from node that hasn't local endpoints to loadBalancerIP. Related: Issue https://github.com/kubernetes/kubernetes/issues/653... — committed to kinolaev/kubernetes by kinolaev 5 years ago
- iptables proxier: fix comments for LB IP traffic from local address Signed-off-by: Andrew Sy Kim <kiman@vmware.com> — committed to kubernetes/kubernetes by andrewsykim 5 years ago
- e2e: use container network to access routes Kubernetes does not currently support routing packets from the host network interface to LoadBalancer Service external IPs. Although such routing tends to ... — committed to ironcladlou/origin by ironcladlou 5 years ago
- e2e: use container network to access routes Kubernetes does not currently support routing packets from the host network interface to LoadBalancer Service external IPs. Although such routing tends to ... — committed to ironcladlou/origin by ironcladlou 5 years ago
still affecting me on gke. i want to enable the
transparency
so that I can use the spam-blocking in wordpress (which is by ext ip), but I also want to refer to other services which may be internal to the cluster via their proper name (to get the `tls etc).Same here.
As far as I know it hasn’t