kubernetes: "externalTrafficPolicy": "Local" on AWS does not work if the dhcp of the vpc is not set exactly to .compute.internal

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

/sig aws

What happened:

Run a Gossip-cluster Using NLB as ExternalLoadBalancer on AWS, with externalTrafficPolicy set to Local, all the targets in the Target groups were unhealthy even though the pod for the service was running on a specific Node

What you expected to happen:

Find a healthy target on the Instance (node) containing the pod related to the service

How to reproduce it (as minimally and precisely as possible):

  1. Create an existing VPC with domain name set to vpc.internal
  2. Use kops to deploy kubernetes on AWS on the existing VPC as a gossip-cluster (no dns zones) cluster name ends with k8s.local a) kops will correctly start kubelet with --cloud-provider=aws and --hostname-override=<ip-…>.vpc.internal b) Observe the kubectl get nodes still shows node names as <ip-…>.region.compute.internal (non us-east-1 region)
  3. Deploy a pod affinitized to a specific node
  4. Deploy a LoadBalancer service for NLB annotation with externalTrafficPolicy: Local
  5. Observe in aws that Target Group associated with the NLB doesn’t have any healthy targets. Also observe that the curl to the host on the health port returns a 503 with localendpoints: 0

Anything else we need to know?:

Basically, the issues is related to the following bug in a tangential manner

https://github.com/kubernetes/kubernetes/issues/11543

If VPC has a different domain name than the one aws cloudprovider in kubernetes sets for the nodes name then the endpoint.NodeName does not match the hostname that the proxy is running on and this causes the proxy to determine that there are no local endpoints for the service.

Environment:

  • Kubernetes version (use kubectl version): bash-3.2$ kubectl version Client Version: version.Info{Major:“1”, Minor:“9”, GitVersion:“v1.9.2”, GitCommit:“5fa2db2bd46ac79e5e00a4e6ed24191080aa463b”, GitTreeState:“clean”, BuildDate:“2018-01-18T10:09:24Z”, GoVersion:“go1.9.2”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“9”, GitVersion:“v1.9.4”, GitCommit:“bee2d1505c4fe820744d26d41ecd3fdd4a3d6546”, GitTreeState:“clean”, BuildDate:“2018-03-12T16:21:35Z”, GoVersion:“go1.9.3”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration: AWS

  • OS (e.g. from /etc/os-release): NAME=“Red Hat Enterprise Linux Server” VERSION=“7.4 (Maipo)” ID=“rhel” ID_LIKE=“fedora” VARIANT=“Server” VARIANT_ID=“server” VERSION_ID=“7.4” PRETTY_NAME=“Red Hat Enterprise Linux Server 7.4 (Maipo)” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:redhat:enterprise_linux:7.4:GA:server” HOME_URL=“https://www.redhat.com/” BUG_REPORT_URL=“https://bugzilla.redhat.com/

REDHAT_BUGZILLA_PRODUCT=“Red Hat Enterprise Linux 7” REDHAT_BUGZILLA_PRODUCT_VERSION=7.4 REDHAT_SUPPORT_PRODUCT=“Red Hat Enterprise Linux” REDHAT_SUPPORT_PRODUCT_VERSION=“7.4”

  • Kernel (e.g. uname -a): Linux ip-10-103-184-242.vpc.internal 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kops
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 14
  • Comments: 62 (12 by maintainers)

Most upvoted comments

I can confirm that this workaround worked for us on EKS 1.15. The full patch to be easily copypasted:

---
spec:
  template:
    spec:
      containers:
        - name: kube-proxy
          command:
            - kube-proxy
            - --hostname-override=$(NODE_NAME)
            - --v=2
            - --config=/var/lib/kube-proxy-config/config
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName

and the kubectl command for easy copypasting too:

kubectl -n kube-system patch daemonset kube-proxy --patch "$(cat nodeport-local-patch.yml)"

This workaround did not work for us in EKS. It resulted in the Failed to retrieve node info: nodes "${node_name}" not found in the kube-proxy logs. We have a newer version of kube-proxy, so that might be the issue. --hostname-override=$(NODE_NAME) instead worked for us. Here is the relevant portion of our kube-proxy manifest.

- command:
  - kube-proxy
  - --hostname-override=$(NODE_NAME)
  - --v=2
  - --config=/var/lib/kube-proxy-config/config
  env:
  - name: NODE_NAME
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: spec.nodeName

@shaikatz You can patch your kube-proxy daemonset to add two things:

  1. Add a NODE_NAME env var:
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
  1. Then use the NODE_NAME env var to pass in a --hostname-override to the list of flags to the kube-proxy command. Mine looks something like this:
      - command:
        - /bin/sh
        - -c
        - kube-proxy --resource-container="" --oom-score-adj=-998 --master=https://abc123.sk1.us-east-1.eks.amazonaws.com
          --kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
          --hostname-override=${NODE_NAME} 1>>/var/log/kube-proxy.log 2>&1

This is a ridiculous problem that still exists 5 years today. I still cannot wrap my head around why the developers of the aws provider plugin (which it looks like is actually being deprecated) can’t come up with a good solution that has been offered by the community over and over again. The marriage of the domain to the kubelet and the cloud provider is really sad. The assumption that “nobody would EVER use a domain other than <region>.ec2.internal” is equally as sad.

The patch is not a fix, just a workaround to a built-in that has been broken for almost 6 years. I really wish there was some more traction here… 😦

Every time I set up a new EKS cluster and forget to codify this, I keep bumping my head around until I run across this stupid thread… again, and again… and again… I know I should know better.

just got response from AWS support this issue is fixed in EKS v1.22.6-eksbuild.1 kube-proxy is patched this also works with NGINX ingress controller and AWS ELB

containers:
        - command:
            - kube-proxy
            - --v=2
            - --config=/var/lib/kube-proxy-config/config
            - --hostname-override=$(NODE_NAME)
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.22.6-eksbuild.1 
          name: kube-proxy

Hello folks, we’ll be updating our managed kube-proxy addon to include the patch. the default kube-proxy addon patch will came after that.

For me, it is not working because we have an internal domain name (not ec2.internal) and all servers have an internal hostname (node-01).

Kube-proxy has to have the same name as the node in the cluster (ip-X-X-X-.ec2.internal). First I tried to override the node name in the cluster with a kops setting, but I cannot use nodes’ environment variables.

The only solution is to match the kube-proxy hostname with the node name in the cluster.

I didn’t find a way to set the right hostname to kube-proxy with all the kops setting. We can override the hostname but not the domain name. It is always looking for it in the DHCP options set.

My workaround is to use the kops hooks. I change the proxy configuration file just before kubelet start:

hooks:
  - name: change-kube-proxy-domain
    roles:
    - Master
    - Node
    before:
    - kubelet.service
    execContainer:
      command:
      - sh
      - -c
      - chroot /rootfs sed -i -re "s/(ip-[0-9]*-[0-9]*-[0-9]*-[0-9]*).[a-z]*.my-internal.domain/\1.ec2.internal/g" /etc/kubernetes/manifests/kube-proxy.manifest
      image: busybox

If your are not in us-east-1 the ec2 domain is $region.compute.internal where region is us-west-1, eu-central-1, …

I can confirm that this workaround worked for us on EKS 1.15. The full patch to be easily copypasted:

---
spec:
  template:
    spec:
      containers:
        - name: kube-proxy
          command:
            - kube-proxy
            - --hostname-override=$(NODE_NAME)
            - --v=2
            - --config=/var/lib/kube-proxy-config/config
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName

and the kubectl command for easy copypasting too:

kubectl -n kube-system patch daemonset kube-proxy --patch "$(cat nodeport-local-patch.yml)"

Worked for me on EKS 1.16. Thanks!

I can confirm that this workaround worked for us on EKS 1.15. The full patch to be easily copypasted:

---
spec:
  template:
    spec:
      containers:
        - name: kube-proxy
          command:
            - kube-proxy
            - --hostname-override=$(NODE_NAME)
            - --v=2
            - --config=/var/lib/kube-proxy-config/config
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName

and the kubectl command for easy copypasting too:

kubectl -n kube-system patch daemonset kube-proxy --patch "$(cat nodeport-local-patch.yml)"

Many thanks @vide, @victortrac @bluskool! This work for me on EKS 1.15 and Contour ingress

@rochacon Thanks for the pointer, but it is enabled:

enable_dns_hostnames = true

It looks like the problem lies with the OS, both on CoreOS and Debian our hostname is “short”. We launched Amazon Linux instance to test, and its long there:

Amazon Linux:

[ec2-user@ip-10-66-10-166 ~]$ hostname
ip-10-66-10-166.eu-west-1.compute.internal
[ec2-user@ip-10-66-10-166 ~]$ hostname -f
ip-10-66-10-166.eu-west-1.compute.internal

CoreOS Container Linux:

core@ip-10-66-23-134 ~ $ hostname
ip-10-66-23-134
core@ip-10-66-23-134 ~ $ hostname -f
ip-10-66-23-134

Debian:

admin@ip-10-66-0-48:~$ hostname
ip-10-66-0-48
admin@ip-10-66-0-48:~$ hostname -f
ip-10-66-0-48.eu-west-1.compute.internal

We too have faced similar issue. But in our case we are using the default DHCP options.

Kubelet registering a node, uses the full name: ip-10-66-23-111.eu-west-1.compute.internal, presumably using something like:

$ curl -s http://169.254.169.254/latest/meta-data/local-hostname
ip-10-66-23-111.eu-west-1.compute.internal

But kube-proxy uses os.Hostname() as nodeName: https://github.com/kubernetes/kubernetes/blob/master/pkg/util/node/node.go#L52

Which it then uses to try and get the IP: https://github.com/kubernetes/kubernetes/blob/master/pkg/util/node/node.go#L122

But os.Hostname() on the node (in AWS) returns: ip-10-66-23-111.

So kube-proxy fails to get the IP, and binds to 127.0.0.1. Relevant: https://github.com/kubernetes/kubernetes/pull/83822

Same problem doesn’t exist in GCP (for example), since hostname in GCP returns the full hostname, with the domain.

I don’t know what is the right fix here, but it definitely feels wrong for us to patch your kube-proxy DaemonSet, when we are running “vanilla” setup, thats broken OOB.

wasn’t this fixed per previous comment?

why do we need to keep it open @pierluigilenoci ?

If it was fixed and the fix released, then it should be closed indeed, but not by a stale-bot. @M00nF1sh can you close this as solved maybe, since it looks like you fixed it in AWS? 🙏🏽 Thanks a lot!

This issue bit us hard during our EKS scheduled worker node upgrade. Fresh nodes with the updated AMI failed health checks and failed to join the nodegroup, the upgrade failed, leaving our cluster in an inconsistent mess, with our ingress controller working on the old nodes but not on the new ones. As we had changed the VPC DHCP options’ search domain a couple of weeks before the upgrade, it was not considered as a possible cause for the outage until we found this open issue.

We’ve modified the kube-proxy manifests starting EKS 1.22 to specify the --hostname-override flag for both the managed and the default addons. For older clusters and non-EKS distros, please verify the --hostname-override flag is set as mentioned in the snippet below:

containers:
        - command:
            - kube-proxy
            - --v=2
            - --config=/var/lib/kube-proxy-config/config
            - --hostname-override=$(NODE_NAME)
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName

Since the issue requires manifest changes, there is no further action necessary at the moment, I’m closing this issue.

If problem persists after applying the manifest changes, feel free to reach out to us.

/close

@aojea I removed the stale status because it was not clear to me that the ticket was really resolved. It doesn’t make sense that it’s closed due to starvation.

For me, if it’s solved, it closes. If it’s not closed then it’s not solved. 😜

It rolls back because EKS managed addon do not keep patch made to the manifest. The permanent fix to this “reset” seems on the way.

This looks more like another workaround because you have to switch from managed to unmanaged add-on. As far as I know there is no fix yet.

I found the root cause of this issue - my own fault 😃 I set DHCP option set incorrectly to <eks_name>.compute.internal. After setting it to <region>.compute.internal nodes load correctly and fast. We use v1.21.2-eks-55daa9d version.

This is still an issue. Also specifically, in my new use-case, it had nothing to do with the domain of the VPC DHCP Option set, it was simply due to the hostname of the system changing from the endpoint DNS of the original .ec2.internal launch.

Does Amazon provide some place to update their code for provisioning new EKS clusters? This is definitely an issue, IMO, that they should be addressing on their end by injecting this patch. The patch doesn’t cause any problems if nobody touches their hostname already, so I don’t see why it cannot be included on new cluster launches.

AWS EKS devs, are you watching this thread?

@farvour I completely agree, AWS takes way too long to fix broken DNS/DHCP options it sends to VPC clients. The problem with multiple domains in the VPC DNS service gets even worse if you use Ubuntu-1.18+ images that use systemd-resolved in which case you get this in your resolv.conf file on the host and inside the containers (note the extraneous 032 string in the search stanza):

$ kc exec -it dnstest -- /bin/sh
/ # cat /etc/resolv.conf
nameserver x.x.x.10
search default.svc.cluster.local svc.cluster.local cluster.local eu-west-1.compute.internal032my.domain.com032my.otherdomain.com
options ndots:5

Those 032 are encoded spaces. This is because AWS is not RFC conformant and sends ALL domain names in the DHCP OPTION 15 like so:

OPTION:  15  ( 26) Domainname     eu-west-1.compute.internal my.domain.com my.otherdomain.com

which according to the RFCs must be a “single domain name without spaces” hence systemd-resolved dhcp client does the right thing and encodes the spaces into 032.

Instead what AWS should send is OPTION 15 with the main domain only AND OPTION 119 for the search with all domains like so:

OPTION:  15  ( 26) Domainname     eu-west-1.compute.internal
OPTION:  119 ( 60) Domain Search   eu-west-1.compute.internal my.domain.com my.otherdomain.com

@micahhausler

I reported this error 8 months ago. This don’t work in EKS because the hostname-override is incorrect, I fixed this problem using KOPS adding this configuration in kube-proxy.

kubeProxy: hostnameOverride: $HOSTNAME.us-east-2.compute.internal