kubernetes: --hostname-override ignored when --cloud-provider is specified

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug /sig aws

What happened: Trying to start kubelet with --hostname-override=ip-172-28-68-60 but still see in the logs:

Attempting to register node ip-172-28-68-60.eu-west-1.compute.internal
Unable to register node "ip-172-28-68-60.eu-west-1.compute.internal" with API server: nodes "ip-172-28-68-60.eu-west-1.compute.internal" is forbidden: node "ip-172-28-68-60" cannot modify node "ip-172-28-68-60.eu-west-1.compute.internal"

ps aux: root 4610 3.3 7.7 404596 78320 ? Ssl 12:58 0:00 /usr/bin/kubelet … --hostname-override=ip-172-28-68-60

What you expected to happen: Hostname should be ip-172-28-68-60 instead of ip-172-28-68-60.eu-west-1.compute.internal

How to reproduce it (as minimally and precisely as possible): set –cloud-provider=aws --hostname-override=ip-172-28-68-60 for kubelet

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“8”, GitVersion:“v1.8.1”, GitCommit:“f38e43b221d08850172a9a4ea785a86a3ffa3b3a”, GitTreeState:“clean”, BuildDate:“2017-10-11T23:27:35Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“8”, GitVersion:“v1.8.1”, GitCommit:“f38e43b221d08850172a9a4ea785a86a3ffa3b3a”, GitTreeState:“clean”, BuildDate:“2017-10-11T23:16:41Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration**: aws

  • OS (e.g. from /etc/os-release): NAME=“Ubuntu” VERSION=“16.04.2 LTS (Xenial Xerus)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 16.04.2 LTS” VERSION_ID=“16.04” HOME_URL=“http://www.ubuntu.com/” SUPPORT_URL=“http://help.ubuntu.com/” BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/” VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial

  • Kernel (e.g. uname -a): Linux ip-172-28-68-60 4.4.0-1038-aws #47-Ubuntu SMP Thu Sep 28 20:05:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Others:

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 48
  • Comments: 50 (13 by maintainers)

Commits related to this issue

Most upvoted comments

We have this issue also. Our DEV Kubernetes live at baremetal env where kubelet --hostname-override option works fine and hence we are able to set node names to:

node01-dev.<project_name>.com
node02-dev.<project_name>.com
node03-dev.<project_name>.com
...

But our staging and PROD envs are on AWS where we use --provider aws. In this case kubelet does not allow us to set node names according to our naming convention. In AWS we have the self-descriptive DNS names like node02-staging.<project_name>.com but cannot make node names to be equal to these DNS names because kubelet overrides it with ugly AWS ip-xx-yy-zz.us-west-1.compute.internal.

Imagine that you have a team of engineers and they receive an alert ‘staging Kubernetes node 5 - disk full’. What would you like them to see when they run kubectl get nodes - node05.staging - NotReady or ip-124-12-34.13.us-west-1.compute.internal - NotReady? 😉 Especially taking into account that node05.staging is a resolvable DNS name convenient for the team to use and that ‘staging’ word is in the node name so it is less possible for the engineer to accidentally go to a wrong env…

What I want to say is that in AWS having a human-frindly CNAME record for each node and node name equal to this DNS record allows to make kubectl get node and kubectl describe node commands to give a more human-friendly representation of the cluster and reduce the human error factor.

This is causing so much trouble for us. We have a naming scheme which quickly allows us to see zone/environment for a node, but the “hardcoded” node naming in Kubernetes hinders all of this. Logfiles/metrics are super-hard to work with, as I have to cross-check the “auto-name” against our inventory list all the time. This is a real problem, and it really needs to be solved.

We’re having a hell of a time working around this in the Canonical Distribution of Kubernetes

Why are you required to use non-standard hostnames for AWS?

This is affecting me too! I believe it is using the EC2 instance’s private hostname and ignoring --hostname-override. We are not using kubeadm so we haven’t found a workaround.

Can we get some traction on this issue? Seems there is a real need for this option to be implemented but it keeps getting brushed aside and related issues emerge while PRs go stale.

@liggitt

Once the kubelet was cloud-agnostic, no longer had to make cloud provider API calls, and just used whatever hostname/nodename it was told to, if you wanted to maintain a non-standard mapping between cloud provider node name, kubernetes node name, and hostname, that could be supported by the cloud provider’s implementation if they chose to (that would mean communicating and using that mapping consistently in components that talked to the cloud provider API… ensuring everyone is using the same mapping and translating back and forth can get pretty painful)

AWS allows you to run your own DHCP and private zone, so it is a perfectly valid use case to change the hostname and canonical name of a VM, have a private hosted zone that, for all intents and purposes, completely masks over the “AWS-internal” name of the VM.

~Having --hostname-override= not work is clearly broken behavior.~

AWS specifically publishes articles on how to change your VM hostname, so it is clearly seen as a valid approach by them.

~The fact that K8s has decided to use AWS’s downward API route of http://169.254.169.254/latest/meta-data/hostname and ignore the configuration override is a bug, plain and simple, not some conformance to cloud provider purity.~

Edit @liggitt Okay, so I realize now that the issue is more complex than just the Kubelet, but how other components see the Kubelet as well. Sorry for my heated response. I was very frustrated by this undocumented behavior and was annoyed that I couldn’t fix it.

To build on @daniilyar 's comments, I posted the following in the slack sig-node but didn’t get any response so I’ll add here for posterity:

Looking through the referenced github PRs and tickets, I’m curious about some of the comments here: https://github.com/kubernetes/kubernetes/pull/58114#discussion_r160840393

In particular: “In non-cloud environments, hostnames cannot always be auto-detected.” This seems like a rather basic thing to do. How was it problematic?

and “Originally, the reported hostname and the nodeName were identical. That is no longer required to be the case” While this may be true from a code perspective, it’s rather awkward for them not to be the same from a functional/operational point of view (not to mention that it downright breaks if you set the hostname and associated client certificates first and then try to enable the cloud-provider option at a later point). If I specifically set a hostname, why is that not what the node should register as and what I should see in kubectl get nodes? It seems like, if you’re going to default to one or the other, it should be the one I made the effort to explicitly set, not the generic EC2 name.

the comments in https://github.com/kubernetes/kubernetes/pull/58114#discussion_r161320813 are relevant, hoisting them here:

In cloud environments, you need an authority on the following things (and the mapping between them):

  • Node machine IDs (for things like ec2 API calls)
  • Node hostnames (for determining whether a node should be given a serving cert for a particular hostname, etc)
  • Node API object names (for node controller, attach/detach controller, persistent volume provisioners, etc to reconcile cloud provider state to nodes and their associated resources)

The kubelets cannot be the authority on those things:

  • before a kubelet has reported state, no other components can know anything about what nodes exist
  • it lets a kubelet request or interfere with resources (hostnames, volumes, etc) it shouldn’t have access to

@kubernetes/sig-cluster-lifecycle-feature-requests is working to move the cloud provider integrations out of tree. As part of that, I’d expect the kubelet to be able to be told its hostname and nodename, and to know nothing about the cloud provider at all.

Once the kubelet was cloud-agnostic, no longer had to make cloud provider API calls, and just used whatever hostname/nodename it was told to, if you wanted to maintain a non-standard mapping between cloud provider node name, kubernetes node name, and hostname, that could be supported by the cloud provider’s implementation if they chose to (that would mean communicating and using that mapping consistently in components that talked to the cloud provider API… ensuring everyone is using the same mapping and translating back and forth can get pretty painful)

it downright breaks if you set the hostname and associated client certificates first and then try to enable the cloud-provider option at a later point

mutating a node’s configuration in pieces is not likely to end well, generally. changing its identity is just one of the things that is likely to cause issues.

this does affect OpenShift too, i tried in Origin 3.7 with no success…

this should work if you conform to the cloud provider’s view of node names and hostnames when you create the nodes

Here is a related issue that was closed by robot because nobody was on it for 90 days: https://github.com/kubernetes/kubernetes/issues/22984

I have a similar problem. There are multiple domains in DHCP option (like "example.com test.com contoso.com"). When kubelet tries to start it obtains hostname using ec2 metadata endpoint http://169.254.169.254/latest/meta-data/hostname. Hostname in metadata looks like ip-10-11-39-89.example.com test.com contoso.com. This string can’t be used as a node name of course and I get

E0419 17:44:56.637184   17496 kubelet_node_status.go:93] Unable to register node "ip-10-11-39-89.example.com test.com contoso.com" with API server: Node "ip-10-11-39-89.example.com test.com contoso.com" is invalid: metadata.name: Invalid value: "ip-10-11-39-89.example.com test.com contoso.com": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

We’re having a hell of a time working around this in the Canonical Distribution of Kubernetes. Since --cloud-provider=aws changes the node name, and we can’t override it with --hostname-override, we’re forced to make changes elsewhere in cluster configuration:

  • We have to use the correct node name when creating an auth token for kubelet, otherwise the Node authorizer won’t allow kubelet to register
  • We have to pass --hostname-override to kube-proxy using the correct node name so it can correctly identify local endpoints

We’re able to work around this, but it seems unfortunate that --cloud-provider=aws changes the node name and doesn’t let you override it. The cluster operator is forced to change configuration in places that aren’t obvious at all.

I fixed the issue for me with:

kubeadm init --pod-network-cidr=10.234.0.0/16 --node-name=$(curl http://169.254.169.254/latest/meta-data/local-hostname)

but it just a workaround in my case. Doing --node-name during the cluster initialization wasn’t fixed “kubelet ignore option --hostname-override

I found an interesting nuance. I see that the issue was created in 2017. But I faced the issue with the hostname only in EKS 1.19 when I removed --cloud-provider=aws from kubelet when I saw the message in logs:

 Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.

When I returned the argument back, the node successfully joined the cluster even though its hostname is actually different from EC2 private DNS name. Of course, I still see something like this in the node list in kubectl:

ip-10-20-30-40.ec2.internal

but at least in EC2 console, in monitoring dashboards and in the node shell I’m getting normally configured name.

In short, my bootstrapping process looks like this:

    # ...general bootstrapping...
    hsname="$${AWS_EKS_CLUSTER_NAME}-$${AWS_EKS_NODE_GROUP}-$${INSTANCE_ID}" # do what you like
    echo "$${hsname}" > /etc/hostname
    hostname "$${hsname}"
    hostnamectl set-hostname "$${hsname}"
    hostnamectl set-hostname --transient "$${hsname}"
    hostnamectl set-hostname --pretty "$${hsname}"

    systemctl daemon-reload
    systemctl enable kubelet
    systemctl start kubelet

(And I never used --hostname-override with kubectl)

But I guess all of this will be broken soon when I upgrade upper than 1.19 when --cloud-provider will be completely out of support. 😞

/assign @nckturner /triage accepted

We have the similar issue with AWS VPC EC2 instances.

This is a problem for my customer as well. VPC is not public facing and is required to use customer DNS targets only.

@2rs2ts and I do not use kubeadm or kubespray to setup our test cluster. This appears to be an issue with kubelet itself.

Hi @dims

Thank you for your reply! I just tried to disable NodeRestriction plugin for apiserver and I broke my cluster (no worries it is a test cluster). First of all node ip-172-28-68-60 disappeared from node list and now I see:

kubectl get nodes
NAME                                         STATUS     ROLES     AGE       VERSION
ip-172-28-68-60                              NotReady   master    17h       v1.8.1
ip-172-28-68-60.eu-west-1.compute.internal   Ready      <none>    21m       v1.8.1

kube-dns cannot start with the following error:

  Type     Reason       Age               From                                                 Message
  ----     ------       ----              ----                                                 -------
  Normal   Scheduled    9m                default-scheduler                                    Successfully assigned kube-dns-545bc4bfd4-dm2hk to ip-172-28-68-60.eu-west-1.compute.internal
  Warning  FailedMount  4m (x2 over 7m)   kubelet, ip-172-28-68-60.eu-west-1.compute.internal  Unable to mount volumes for pod "kube-dns-545bc4bfd4-dm2hk_kube-system(5ab951c8-b94e-11e7-850c-06a7147df332)": timeout expired waiting for volumes to attach/mount for pod "kube-system"/"kube-dns-545bc4bfd4-dm2hk". list of unattached/unmounted volumes=[kube-dns-config kube-dns-token-7gzws]
  Warning  FailedSync   4m (x2 over 7m)   kubelet, ip-172-28-68-60.eu-west-1.compute.internal  Error syncing pod
  Warning  FailedMount  2m (x11 over 9m)  kubelet, ip-172-28-68-60.eu-west-1.compute.internal  MountVolume.SetUp failed for volume "kube-dns-config" : configmaps "kube-dns" is forbidden: User "system:node:ip-172-28-68-60" cannot get configmaps in the namespace "kube-system": no path found to object
  Warning  FailedMount  2m (x11 over 9m)  kubelet, ip-172-28-68-60.eu-west-1.compute.internal  MountVolume.SetUp failed for volume "kube-dns-token-7gzws" : secrets "kube-dns-token-7gzws" is forbidden: User "system:node:ip-172-28-68-60" cannot get secrets in the namespace "kube-system": no path found to object

looks like that nodes what were already joined to cluster needs to be rejoined.

In the same time kubelet continue to work with --hostname-override=ip-172-28-68-60

looks like that is kubelet simply ignores --hostname-override option.

BR, Vasily.