openshift-ansible: Could not install cluster in AWS ap-southeast-2: failed to run Kubelet: could not init cloud provider "aws"
Description
Trying to spin up a fresh cluster on aws region ap-southeast-2.
$ ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/prerequisites.yml -e @playbooks/aws/provisioning_vars.yml
"Succeeds"
$ ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/build_ami.yml -e @playbooks/aws/provisioning_vars.yml
"Succeeds"
$ ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/provision.yml -e @playbooks/aws/provisioning_vars.yml
"Succeeds"
$ ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/install.yml -e @playbooks/aws/provisioning_vars.yml
"FAILS"
Version
$ansible --version
ansible 2.6.2
config file = /home/josha/proj/openshift-ansible/ansible.cfg
configured module search path = ['/home/josha/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/josha/.local/lib/python3.6/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 3.6.5 (default, Apr 1 2018, 05:46:30) [GCC 7.3.0]
$git describe
openshift-ansible-3.10.43-1-10-g189969f82
Steps To Reproduce
-
ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/prerequisites.yml -e @playbooks/aws/provisioning_vars.yml -
ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/build_ami.yml -e @playbooks/aws/provisioning_vars.yml -
ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/provision.yml -e @playbooks/aws/provisioning_vars.yml -
ansible-playbook -v -i playbooks/aws/provisioning-inventory.yml playbooks/aws/openshift-cluster/install.yml -e @playbooks/aws/provisioning_vars.yml
Expected Results
I expect playbooks/aws/openshift-cluster/install.yml to succeed
Observed Results
TASK [openshift_control_plane : fail] *********************************************************************************************************************************************************************************************************************************************************************************************************************
Friday 07 September 2018 09:45:06 +1000 (0:00:00.151) 0:03:05.139 ******
fatal: [ec2-xxx-xxx-xxx-172.ap-southeast-2.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": "Node start failed."}
fatal: [ec2-xxx-xxx-xxx-85.ap-southeast-2.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": "Node start failed."}
fatal: [ec2-xxx-xxx-xxx-238.ap-southeast-2.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": "Node start failed."}
NO MORE HOSTS LEFT ****************************************************************************************************************************************************************************************************************************************************************************************************************************************
PLAY RECAP ************************************************************************************************************************************************************************************************************************************************************************************************************************************************
ec2-xxx-xxx-xxx-238.ap-southeast-2.compute.amazonaws.com : ok=154 changed=22 unreachable=0 failed=1
ec2-xxx-xxx-xxx-172.ap-southeast-2.compute.amazonaws.com : ok=207 changed=27 unreachable=0 failed=1
ec2-xxx-xxx-xxx-85.ap-southeast-2.compute.amazonaws.com : ok=154 changed=22 unreachable=0 failed=1
localhost : ok=18 changed=1 unreachable=0 failed=0
Running journalctl -xe on one of the nodes, the following error shows up:
"Sep 06 23:45:05 ip-xxx-xxx-xxx-160.ap-southeast-2.compute.internal origin-node[32115]: I0906 23:45:05.811802 32115 aws.go:1033] Building AWS cloudprovider",
"Sep 06 23:45:05 ip-xxx-xxx-xxx-160.ap-southeast-2.compute.internal systemd[1]: Failed to start OpenShift Node.",
"Sep 06 23:45:05 ip-xxx-xxx-xxx-160.ap-southeast-2.compute.internal origin-node[32115]: F0906 23:45:05.894033 32115 server.go:233] failed to run Kubelet: could not init cloud provider \"aws\": error finding instance i-085ddf3267cc5f2ce: \"error listing AWS instances: \\\"InvalidInstanceID.NotFound: The instance ID 'i-085ddf3267cc5f2ce' does not exist\\\\n\\\\tstatus code: 400, request id: 4cc327a9-45df-47da-9b81-2a38347a9ab9\\\"\"",
However i-085ddf3267cc5f2ce definitely does exist and is one of the nodes.
I’ve done some googling and I think the issue may be that the Kublet is using the wrong AWS region to list the instances, but I’m not sure how to fix this.
Additional Information
- Inventory file:
playbooks/aws/provisioning-inventory.yml
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
################################################################################
# Ensure these variables are set for bootstrap
################################################################################
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
ansible_ssh_user=centos
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_default_subdomain=apps.openshift.<my-domain>.com
ansible_become=true
openshift_cloudprovider_kind=aws
openshift_cloudprovider_aws_access_key=<my-access-key>
openshift_cloudprovider_aws_secret_key=<my-secret-key>
# openshift_deployment_type is required for installation
openshift_deployment_type=origin
openshift_master_api_port=443
openshift_hosted_router_wait=False
openshift_hosted_registry_wait=False
openshift_clusterid=testing
################################################################################
# cluster specific settings maybe be placed here
[masters]
[etcd]
[nodes]
Provisioning vars: playbooks/aws/provisioning_vars.yml
---
openshift_deployment_type: 'origin'
openshift_release: '3.10'
openshift_pkg_version: '-3.10.0'
openshift_aws_clusterid: 'oc-test'
openshift_aws_region: ap-southeast-2
openshift_aws_create_launch_config: true
openshift_aws_create_scale_group: true
openshift_aws_create_vpc: true
openshift_aws_vpc:
name: "{{ openshift_aws_vpc_name }}"
cidr: 172.31.0.0/16
subnets:
ap-southeast-2:
- cidr: 172.31.48.0/20
az: "ap-southeast-2a"
default_az: true
- cidr: 172.31.32.0/20
az: "ap-southeast-2b"
- cidr: 172.31.16.0/20
az: "ap-southeast-2c"
openshift_aws_create_security_groups: true
openshift_aws_ssh_key_name: joshainglis_key
- key_name: joshainglis_key
username: centos
pub_key: |
<my-pub-key>
openshift_aws_build_ami_ssh_user: centos
container_runtime_docker_storage_type: overlay2
container_runtime_docker_storage_setup_device: /dev/xvdb
# ap-southeast-2 Official Centos AMI
openshift_aws_base_ami: ami-d8c21dba
openshift_aws_create_s3: True
openshift_aws_elb_cert_arn: 'arn:aws:acm:ap-southeast-2:<my-aws-account>:certificate/<cert-id>'
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (4 by maintainers)
I injected the credentials manually into /etc/sysconfig/origin-node and got a bit further in the process. No idea why the credentials were missing there, I believe it must be a bug in the ansible scripts somewhere. I have no prior experience with ansible, so I’m unfortunately not able to figure out of it.