kubernetes: Can't get --cloud-provider=aws to work (instance not found)
I’m trying to set up AWS ELB attach volume access and keep getting an error on the kubelet. I have some bad feeling it might be related to #9801 as these nodes are CoreOS that were brought up by Terraform with custom service files but everything else is working fine (we’re running cluster monitoring, dns, and a number of our own pods without issues). I’ve run awscli on that instance and the privileges definitely work. What am I doing wrong?
logs from kubelet service
I0719 02:22:59.549444 9662 factory.go:234] Registering Docker factory
I0719 02:22:59.549826 9662 factory.go:89] Registering Raw factory
I0719 02:22:59.638468 9662 manager.go:946] Started watching for new ooms in manager
I0719 02:22:59.639054 9662 oomparser.go:183] oomparser using systemd
I0719 02:22:59.640589 9662 manager.go:243] Starting recovery of all containers
E0719 02:22:59.679107 9662 kubelet.go:787] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
I0719 02:22:59.753816 9662 manager.go:248] Recovery completed
I0719 02:22:59.816369 9662 status_manager.go:76] Starting to sync pod status with apiserver
I0719 02:22:59.816426 9662 kubelet.go:1725] Starting kubelet main sync loop.
E0719 02:22:59.953337 9662 kubelet.go:787] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
E0719 02:23:01.047929 9662 kubelet.go:787] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
E0719 02:23:01.975409 9662 kubelet.go:787] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
E0719 02:23:03.645707 9662 kubelet.go:787] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
E0719 02:23:06.917503 9662 kubelet.go:787] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
I0719 02:23:07.790941 9662 server.go:635] POST /stats/container/: (46.254087ms) 0 [[Go 1.1 package http] 10.0.39.222:45497]
kube-kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
Environment="KUBERNETES_BINARY_VERSION=1.0.0"
EnvironmentFile=/etc/environment
ExecStartPre=/usr/bin/curl -L -o /opt/bin/kubelet https://storage.googleapis.com/kubernetes-release/release/v${KUBERNETES_BINARY_VERSION}/bin/linux/amd64
ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet
ExecStart=/opt/bin/kubelet \
--address=0.0.0.0 \
--port=10250 \
--cloud-provider=aws \
--hostname-override=${COREOS_PRIVATE_IPV4} \
--api-servers=${KUBE_MASTER_IP}:8080 \
--allow-privileged=false \
--cluster_dns=10.2.0.2 \
--cluster_domain=cluster.local \
--cadvisor_port=4194 \
--healthz_bind_address=0.0.0.0 \
--healthz_port=10248 \
--v=2 \
--logtostderr=true
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
iam policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:Describe*"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:AttachVolume",
"ec2:DetachVolume"
],
"Resource": [
"arn:aws:ec2:*:*:instance/*"
]
},
]
}
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 60 (33 by maintainers)
Commits related to this issue
- pass cloudProvider from Kubelet to volume plugins — committed to kubernetes/kubernetes by BugRoger 9 years ago
- Fix #11543 - use instance id instead of private DNS name for AWS cloud provider — committed to jtblin/kubernetes by jtblin 9 years ago
- Fix #11543 - use private ip address instead of private DNS name for AWS cloud provider — committed to jtblin/kubernetes by jtblin 9 years ago
- Fix #11543 - use private ip address instead of private DNS name for AWS cloud provider — committed to atlassian/kubernetes by jtblin 9 years ago
- Fix #11543 - use private ip address instead of private DNS name for AWS cloud provider — committed to atlassian/kubernetes by jtblin 9 years ago
- Fix https://github.com/kubernetes/kubernetes/issues/11543. instance-id is the ID AWS expects — committed to Nordstrom/kubernetes by deleted user 9 years ago
- Fix #11543 - use DescribeInstances API to retrieve the correct private DNS name — committed to jtblin/kubernetes by jtblin 8 years ago
- Quick and dirty workaround for #26 (kubernetes/kubernetes#11543) — committed to kubernetes-retired/kubernetes-anywhere by errordeveloper 8 years ago
- Use nodeutil.GetHostIP consistently when talking to nodes Most of our communications from apiserver -> nodes used nodutil.GetNodeHostIP, but a few places didn't - and this meant that the node name ne... — committed to justinsb/kubernetes by justinsb 8 years ago
Well I found the cause at least:
Kubelet attempts to get its ExternalID. To do this is uses the mis-named aws.findInstanceByNodeName which actually looks up the node by private-dns-name not node name (which is potentially a good thing since in the case of autoscaling groups node names won’t be unique, but still, confusing name). This would probably work under a normal situation but we’re using a VPC DHCP option set to use an internal management domain, not the standard
<region>.compute.internal. When the kubelet is configured for AWS it uses the metadata service to get it’s local-hostname which in our case doesn’t match the private DNS name, it matches our management domain.I’m not really sure what the right fix here is. My instinct is that
--hostname-overrideshould actually do something even when you’re using--cloud-provider=aws(it currently seems to do nothing). I notice that// TODO: ExternalID is deprecated, we'll have to drop this codebut I couldn’t find out what that specifically means. Also interesting to me is that thekubernetes.io/hostnamelabel is inconsistent in the case of using cloud provider unless the hostname exactly matches the “ExternalID”. I’m not actually sure this entire section of code (kubelet.go 739-761) even does anything useful vs just usingkl.hostnamedirectly for the ExternalID. If ExternalID is really deprecated I’m inclined to submit a patch to rip that out (well, replace it with just kl.hostname).Just to add to this… we’ve setup K8S 1.0.6 inside our VPC and just added
--cloud-provider=aws(and the accompanying IAM profiles to our nodes). While things likekubectlwork fine, we noticed that thekubeletprocesses are now registering themselves in thekube-apiserverwith their EC2 name rather than the hostnames we use internally.Now, we “allow” use of the
.internaldomain name from Amazon in the sense that we can resolve those names just fine. The only real issue here is that thenodeNamein Kubernetes is now wrong and misleading for our engineers:I also tried adding
--hostname-override=tools-k8s-node-uswest1-17-i-905b2250to kubelet’s startup parameters and it had no impact on the name actually registered in the api server.@thockin @justinsb
This also breaks the AWS cloud provider when using private hosted DNS zones.
Essentially, kubelet thinks the nodename is whatever the local hostname is (“ip-xx-xx-xx-xx-.my.custom.domain.com”) and then tries to get the instance details by querying the EC2 API by
private-dns-name, which actually is something like “ip-xx-xx-xx-xx.us-west-2.compute.internal”). This fails of course.This makes it impossible to use the AWS cloud provider and therefore automatic ELB provisioning, etc.
I’m not familiar with github but how can this be re-opened? Having read through the entire thread, I don’t understand why this was closed. I’m using 1.13.1 and this is causing a problem, and it is actually a bigger problem than most people are commenting here.
Going back to the original post, the problem is our AWS environment is confined to a VPC and our company’s domain, so hostname has to be in “my_hostname.my_company.net” format. We can’t use ip-10-xx-xx-xx name, but setting --cloud-provider=aws with kubelet looks for hostname in ip-10-xx-xx-xx format. The problem is in order to be able to create persistent volumes in AWS (ebs or efs types), I need to set --cloud-provider=aws but I cannot do that because node names have to be in ip-10-xx-xx-xx format, and on it goes around the circle. I say this is a bigger problem because I cannot create any persistent volume in my cluster as long as the node names are in my_host.my_company.net format but being able to create a persistent volume is almost an absolute necessity for a any reasonably useful cluster.
I think this is still an issue with kubeadm v1.8.2. There needs to be way more doc for AWS as a cloud provider, or maybe something out-of-the-box with kubeadm. At the moment, it seems super under doc’ed, especially when many people have this use-case right?
I like the idea to switch to instanceId as the node-name, but I can see issues here. Ideally I would like to be able to set my node-names to be DNS resolvable like they currently are. What about having the findInstanceByNodeName method look for
instanceId,private-dns-nameOR a tag callednode-name?Currently I am looking at running OpenShift on Kubernetes and the only feature of
--cloud-provider=awsthat I really need is the EBS volume plugin. It seems a bit overkill to require all my node names to beinstanceIdjust to get that plugin working.(This used to work without using
--cloud-provider=aws)Forgive me if I’m missing something obvious here, and for repeating much of what @philk and @bkeroackdsc have said already, but I’m confused by the above conclusions. Using an
instance-idas a node name (as suggested in #11883) is not ideal, non-default hostnames are that way for a reason and it’s going to be much clearer to users / administrators if they see the hostname rather than an instance id (imho). When securing a dynamic infrastructure it’s often a necessity to work with a private domain in order to supply certificates - the ip addresses of nodes are unpredictable. Cloud providers need not be aware of this.Kubernetes internally should not conflate the node name - the hostname of a node in the cluster, with the
instance-id, the identifier it uses to communicate with the aws cloud provider.The call to
ExternalIDeventually ends up callinggetInstanceByNodeName(see below), with the implication that the node name is theprivate-dns-name- can’t this simply be changed to a more correct behavior of retrieving the instance id from the metadata service, and adjusting the query to use that? The change would entail fetching theinstance-idfrom the metadata service, caching that, and changing the tag query to useinstance-idinstead ofprivate-dns-name. This should always succeed asinstance-idis unique to an instance.https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L2083
The only potential complication I foresee is if another member of the cluster needs to use the kubernetes node name in order to operate on the cloud provider, i.e. trying to attach an volume to an instance. As far as I can tell those operations are done locally, which is further implied by the iam permissions applied to
kubeletnodes (ec2:AttachVolumeandec2:DetachVolume), but I haven’t read the entire codebase. If this is a necessary use case, I’d suggest that a reverse dns lookup for the hostname, and then finding the instance with the cloud provider based on it’s private ip, is more correct than assuming that the cloud provider has any knowledge of the instances hostname.I’d be happy to put this change together but would like some feedback on the idea - @thockin @justinsb thoughts?