kubernetes: --hostname-override ignored when --cloud-provider is specified
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug /sig aws
What happened: Trying to start kubelet with --hostname-override=ip-172-28-68-60 but still see in the logs:
Attempting to register node ip-172-28-68-60.eu-west-1.compute.internal
Unable to register node "ip-172-28-68-60.eu-west-1.compute.internal" with API server: nodes "ip-172-28-68-60.eu-west-1.compute.internal" is forbidden: node "ip-172-28-68-60" cannot modify node "ip-172-28-68-60.eu-west-1.compute.internal"
ps aux: root 4610 3.3 7.7 404596 78320 ? Ssl 12:58 0:00 /usr/bin/kubelet … --hostname-override=ip-172-28-68-60
What you expected to happen: Hostname should be ip-172-28-68-60 instead of ip-172-28-68-60.eu-west-1.compute.internal
How to reproduce it (as minimally and precisely as possible): set –cloud-provider=aws --hostname-override=ip-172-28-68-60 for kubelet
Anything else we need to know?:
Environment:
-
Kubernetes version (use
kubectl version
): Client Version: version.Info{Major:“1”, Minor:“8”, GitVersion:“v1.8.1”, GitCommit:“f38e43b221d08850172a9a4ea785a86a3ffa3b3a”, GitTreeState:“clean”, BuildDate:“2017-10-11T23:27:35Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“8”, GitVersion:“v1.8.1”, GitCommit:“f38e43b221d08850172a9a4ea785a86a3ffa3b3a”, GitTreeState:“clean”, BuildDate:“2017-10-11T23:16:41Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”} -
Cloud provider or hardware configuration**: aws
-
OS (e.g. from /etc/os-release): NAME=“Ubuntu” VERSION=“16.04.2 LTS (Xenial Xerus)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 16.04.2 LTS” VERSION_ID=“16.04” HOME_URL=“http://www.ubuntu.com/” SUPPORT_URL=“http://help.ubuntu.com/” BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/” VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial
-
Kernel (e.g.
uname -a
): Linux ip-172-28-68-60 4.4.0-1038-aws #47-Ubuntu SMP Thu Sep 28 20:05:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux -
Install tools:
-
Others:
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 48
- Comments: 50 (13 by maintainers)
We have this issue also. Our DEV Kubernetes live at baremetal env where kubelet
--hostname-override
option works fine and hence we are able to set node names to:But our staging and PROD envs are on AWS where we use
--provider aws
. In this case kubelet does not allow us to set node names according to our naming convention. In AWS we have the self-descriptive DNS names likenode02-staging.<project_name>.com
but cannot make node names to be equal to these DNS names because kubelet overrides it with ugly AWSip-xx-yy-zz.us-west-1.compute.internal
.Imagine that you have a team of engineers and they receive an alert ‘staging Kubernetes node 5 - disk full’. What would you like them to see when they run
kubectl get nodes
-node05.staging - NotReady
orip-124-12-34.13.us-west-1.compute.internal - NotReady
? 😉 Especially taking into account thatnode05.staging
is a resolvable DNS name convenient for the team to use and that ‘staging’ word is in the node name so it is less possible for the engineer to accidentally go to a wrong env…What I want to say is that in AWS having a human-frindly CNAME record for each node and node name equal to this DNS record allows to make
kubectl get node
andkubectl describe node
commands to give a more human-friendly representation of the cluster and reduce the human error factor.This is causing so much trouble for us. We have a naming scheme which quickly allows us to see zone/environment for a node, but the “hardcoded” node naming in Kubernetes hinders all of this. Logfiles/metrics are super-hard to work with, as I have to cross-check the “auto-name” against our inventory list all the time. This is a real problem, and it really needs to be solved.
Why are you required to use non-standard hostnames for AWS?
This is affecting me too! I believe it is using the EC2 instance’s private hostname and ignoring
--hostname-override
. We are not usingkubeadm
so we haven’t found a workaround.Can we get some traction on this issue? Seems there is a real need for this option to be implemented but it keeps getting brushed aside and related issues emerge while PRs go stale.
@liggitt
AWS allows you to run your own DHCP and private zone, so it is a perfectly valid use case to change the hostname and canonical name of a VM, have a private hosted zone that, for all intents and purposes, completely masks over the “AWS-internal” name of the VM.
~Having
--hostname-override=
not work is clearly broken behavior.~AWS specifically publishes articles on how to change your VM hostname, so it is clearly seen as a valid approach by them.
~The fact that K8s has decided to use AWS’s downward API route of
http://169.254.169.254/latest/meta-data/hostname
and ignore the configuration override is a bug, plain and simple, not some conformance to cloud provider purity.~Edit @liggitt Okay, so I realize now that the issue is more complex than just the Kubelet, but how other components see the Kubelet as well. Sorry for my heated response. I was very frustrated by this undocumented behavior and was annoyed that I couldn’t fix it.
To build on @daniilyar 's comments, I posted the following in the slack sig-node but didn’t get any response so I’ll add here for posterity:
Looking through the referenced github PRs and tickets, I’m curious about some of the comments here: https://github.com/kubernetes/kubernetes/pull/58114#discussion_r160840393
In particular: “In non-cloud environments, hostnames cannot always be auto-detected.” This seems like a rather basic thing to do. How was it problematic?
and “Originally, the reported hostname and the nodeName were identical. That is no longer required to be the case” While this may be true from a code perspective, it’s rather awkward for them not to be the same from a functional/operational point of view (not to mention that it downright breaks if you set the hostname and associated client certificates first and then try to enable the cloud-provider option at a later point). If I specifically set a hostname, why is that not what the node should register as and what I should see in
kubectl get nodes
? It seems like, if you’re going to default to one or the other, it should be the one I made the effort to explicitly set, not the generic EC2 name.the comments in https://github.com/kubernetes/kubernetes/pull/58114#discussion_r161320813 are relevant, hoisting them here:
In cloud environments, you need an authority on the following things (and the mapping between them):
The kubelets cannot be the authority on those things:
@kubernetes/sig-cluster-lifecycle-feature-requests is working to move the cloud provider integrations out of tree. As part of that, I’d expect the kubelet to be able to be told its hostname and nodename, and to know nothing about the cloud provider at all.
Once the kubelet was cloud-agnostic, no longer had to make cloud provider API calls, and just used whatever hostname/nodename it was told to, if you wanted to maintain a non-standard mapping between cloud provider node name, kubernetes node name, and hostname, that could be supported by the cloud provider’s implementation if they chose to (that would mean communicating and using that mapping consistently in components that talked to the cloud provider API… ensuring everyone is using the same mapping and translating back and forth can get pretty painful)
mutating a node’s configuration in pieces is not likely to end well, generally. changing its identity is just one of the things that is likely to cause issues.
this should work if you conform to the cloud provider’s view of node names and hostnames when you create the nodes
Here is a related issue that was closed by robot because nobody was on it for 90 days: https://github.com/kubernetes/kubernetes/issues/22984
I have a similar problem. There are multiple domains in DHCP option (like
"example.com test.com contoso.com"
). When kubelet tries to start it obtains hostname using ec2 metadata endpointhttp://169.254.169.254/latest/meta-data/hostname
. Hostname in metadata looks likeip-10-11-39-89.example.com test.com contoso.com
. This string can’t be used as a node name of course and I getWe’re having a hell of a time working around this in the Canonical Distribution of Kubernetes. Since
--cloud-provider=aws
changes the node name, and we can’t override it with--hostname-override
, we’re forced to make changes elsewhere in cluster configuration:--hostname-override
to kube-proxy using the correct node name so it can correctly identify local endpointsWe’re able to work around this, but it seems unfortunate that
--cloud-provider=aws
changes the node name and doesn’t let you override it. The cluster operator is forced to change configuration in places that aren’t obvious at all.I fixed the issue for me with:
but it just a workaround in my case. Doing --node-name during the cluster initialization wasn’t fixed “kubelet ignore option
--hostname-override
”I found an interesting nuance. I see that the issue was created in 2017. But I faced the issue with the hostname only in EKS 1.19 when I removed
--cloud-provider=aws
from kubelet when I saw the message in logs:When I returned the argument back, the node successfully joined the cluster even though its hostname is actually different from EC2 private DNS name. Of course, I still see something like this in the node list in
kubectl
:but at least in EC2 console, in monitoring dashboards and in the node shell I’m getting normally configured name.
In short, my bootstrapping process looks like this:
(And I never used
--hostname-override
withkubectl
)But I guess all of this will be broken soon when I upgrade upper than 1.19 when
--cloud-provider
will be completely out of support. 😞/assign @nckturner /triage accepted
We have the similar issue with AWS VPC EC2 instances.
This is a problem for my customer as well. VPC is not public facing and is required to use customer DNS targets only.
@2rs2ts and I do not use
kubeadm
orkubespray
to setup our test cluster. This appears to be an issue withkubelet
itself.Hi @dims
Thank you for your reply! I just tried to disable NodeRestriction plugin for apiserver and I broke my cluster (no worries it is a test cluster). First of all node ip-172-28-68-60 disappeared from node list and now I see:
kube-dns cannot start with the following error:
looks like that nodes what were already joined to cluster needs to be rejoined.
In the same time kubelet continue to work with
--hostname-override=ip-172-28-68-60
looks like that is kubelet simply ignores --hostname-override option.
BR, Vasily.