openshift-ansible: Installation fails on origin-master-api restarting attempt
Description
Installation fails on origin-master-api restarting attempt.
Version
Ansible
ansible 2.4.2.0
config file = None
configured module search path = [u'/home/aizi/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /home/aizi/.local/lib/python2.7/site-packages/ansible
executable location = /home/aizi/.local/bin/ansible
python version = 2.7.13 (default, Nov 24 2017, 17:33:09) [GCC 6.3.0 20170516]
openshift-ansible-3.9.0-0.35.0-8-g1a58f7fc7
Steps To Reproduce
- ansible-playbook -i os-hosts openshift-ansible/playbooks/prerequisites.yml
- ansible-playbook -i os-hosts openshift-ansible/playbooks/deploy_cluster.yml
Failure summary:
1. Hosts: master.dom
Play: Configure masters
Task: restart master api
Message: Unable to restart service origin-master-api: Job for origin-master-api.service failed because the control process exited with error code. See "systemctl status origin-master-api.service" and "journalctl -xe" for details.
Inventory file
[OSEv3:children]
masters
nodes
etcd
[masters]
master.dom
[nodes]
master.dom
node1.dom openshift_node_labels="{'region': 'infra','zone': 'default'}"
node2.dom
#="{'region': 'primary', 'zone': 'default'}"
[etcd]
master.dom
#[masters:vars]
#ansible_become=true
#[nodes:vars]
#ansible_become=true
[OSEv3:vars]
ansible_user=vagrant
ansible_become=true
openshift_deployment_type=origin
openshift_enable_service_catalog=false
openshift_service_catalog_image_prefix=openshift/origin-
openshift_service_catalog_image_version=latest
# You must enable Network Time Protocol (NTP) to prevent masters and nodes in the cluster from going out of sync.
openshift_clock_enabled=true
# Let's change checks values for now
openshift_disable_check=memory_availability,disk_availability
#docker_storage
prerequisites.log gist
deploy_cluster.log gist
Additional Information
As host I’m using Debian Stretch, but from a fresh CentOS I’m receiving the same error. As a vm provider I’m using virtualbox and there I have three boxes ( CentOS official box ) with 2GB RAM and 2 VCPUs each.
I’ve tried to use release-3.7 branch and openshift_release=v3.7 variable on a master branch, but got the same error.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 20 (5 by maintainers)
I had exactly the same issue as @vrutkovs during the installation of OpenShift Origin 3.9.
The problem was that I used the wrong ip in the
/etc/hostsfile.I wrote this after the first 2 default config lines:
127.0.0.1 hostname hostname.domainThe correct way would be to simply let the dns give you the right ip or use the LAN ip:
192.168.x.x hostname hostname.domainIf you used 127.0.0.1 in the
/etc/hoststheorigin-master-apicontainer tries access itself on port 2379 and not the container host / master.In the inventory file, in
[OSEv3]groupHi, same problem here on centos 7. it seems like etcd is configured to listen only on a specific interface. wouldn’t it be easiest to just listen on all interfaces, as it is done for the other services. this could be done by setting the url to listen to 0.0.0.0?
I found the issue yesterday. Official vagrant CentOS box contains this line in /etc/hosts. It’s the first line by the way.
127.0.0.1 node2.dom node2.dom # When you change hostname in /etc/hosts, you should normally rename hostname here as well.It should be removed or commented. If CentOS is installed from scratch, this line doesn’t exist and installation works good.
I think that additional check should be added to the playbook.
All
release-*branches are considered stable,masterwould install 3.9, which is not yet released though.Hmm, interesting.
So master fails to start as it can’t connect to etcd:
F0201 21:39:38.430245 1030 start_api.go:67] [could not reach etcd(v2): client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refusedetcd service seems to be running,
but I’ve noticed firewalld has opened 2380 instead of 2379.and iptables seems to allow 2379 and 2380 thereCould you try rerunning this with
os_firewall_enabled: false? I’m not really familiar with vagrant setup, but it might something else blocking the connectionCould you also attach the output of
journalctl -b -el --unit=origin-master-api.servicefrom the master?