openshift-ansible: when installing on AWS, node hostname and checked pod names don't match
Description
openshift-ansible-3.11.0-0.16.0
ansible-2.6.2-1.el7.noarch (epel)
Steps To Reproduce
- run installer on aws-based hosts
Expected Results
successful installation
Observed Results
Describe what is actually happening.
2018-08-17 12:37:20,201 p=19271 u=root | failed: [master1.ceijaug.internal] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "results": {"cmd": "/bin/oc get pod master-etcd-ip-192-199-0-7.ec2.i
nternal -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-etcd-ip-192-199-0-7.ec2.internal\" not found\n", "stdout": ""}, "state": "list"}
Additional Information
Provide any additional information which may help us diagnose the issue.
- Your operating system and version, ie: RHEL 7.2, Fedora 23 (
$ cat /etc/redhat-release) - Your inventory file (especially any non-standard configuration parameters)
- Sample code, etc
[root@master1 ~]# oc get node
NAME STATUS ROLES AGE VERSION
master1.ceijaug.internal NotReady <none> 6m v1.11.0+d4cacc0
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 30 (14 by maintainers)
I’m just now catching up on this issue. I’ve dug through the environment with @thoraxe
What’s happening, and I’m sorry if this is already clear to everyone else, is that the hostname of the hosts has been changed after provisioning and before the installer runs. So hostname != metadata/hostname. That would not be valid if we were configuring the cloud provider, however we’re not configuring cloud provider integration so we’re fine. However, even when the cloud provider is not configured we override facts[‘common’][‘hostname’] with the name from the metadata API. I think we should disable this metadata inspection whenever we’re not configuring the provider.
I’m trying to work through the implications of this during an upgrade however.
#9956 should fix that
In short - install on AWS would use AWS metadata service to override hostnames. If hostnames don’t match it would fail to install, so once this fix lands in
release-3.10only hostname would be usedHave you looked through the AWS metadata to see if that hostname is there? Do any of the below commands return that name?
curl http://169.254.169.254/latest/meta-data/local-hostname curl http://169.254.169.254/latest/meta-data/public-hostname curl http://169.254.169.254/latest/meta-data/hostname
It’s possibly getting populated around here
and here. 273 looks like where the values are being set
Ansible playbooks are reaching out to AWS metadata to fetch the internal DNS name and replace whatever is set in ansible inventory if the host can be reached.
How was
node1.ceijaug.internaletc hostnames set?Note that in order to override
master-etcd-ip-192-199-0-7.ec2.internalyou’d need Route53 config so that cloudprovider would know about this - a simplehostnamectl set-hostnamewon’t work