openshift-ansible: Timed out accepting certificate signing requests.
Description
Provide a brief description of your issue here. For example:
A Openshift 3.10 cluster installation fails when attempting to accept certificate signing requests. The oc_adm_csr.py times out after 60 seconds. 4 certificates needed to be signed. They all PASSED. But took 67 seconds to complete.
Version
# ansible --version
ansible 2.4.4.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, May 31 2018, 09:41:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
If you’re operating from a git clone:
release-3.10 branch
$ git describe
openshift-ansible-3.10.27-2-69-gd96b19f2a
Steps To Reproduce
- Configure inventory using 1 master and 1 node on bare metal.
- Run the cluster install script.
Expected Results
The installation completes successfully.
Observed Results
"Timed out accepting certificate signing requests. Failing as requested."
INSTALLER STATUS ***************************************************************
Initialization : Complete (0:00:09)
Health Check : Complete (0:02:48)
Node Bootstrap Preparation : Complete (0:00:01)
etcd Install : Complete (0:00:22)
Master Install : Complete (0:01:29)
Master Additional Install : Complete (0:00:48)
Node Join : In Progress (0:01:10)
Failure summary:
1. Hosts: benchserver7.acme.com
Play: Approve any pending CSR requests from inventory nodes
Task: Report approval errors
Message: Node approval failed
For long output or logs, consider using a gist
Detailed -vvv logging of Ansible script. All the gory details are here.
Additional Information
Provide any additional information which may help us diagnose the issue.
# uname -a
Linux benchserver7 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/ansible/hosts
# This is the default ansible 'hosts' file.
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
containerized=false
openshift_deployment_type=openshift-enterprise
debug_level=0
openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true']}, {'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/infra=true',]}, {'name': 'node-config-compute', 'labels': ['node-role.kubernetes.io/compute=true'], 'edits': [{ 'key': 'kubeletArguments.pods-per-core','value': ['20']}]}]
openshift_master_cluster_hostname=benchserver7
ansible_ssh_user=root
openshift_enable_service_catalog=false
disk_availability=false
openshift_disable_check=memory_availability,disk_availability
[masters]
benchserver7.acme.com
[etcd]
benchserver7.acme.com
[nodes]
benchserver7.acme.com openshift_node_group_name='node-config-master'
#benchserver5.acme.com openshift_node_group_name='node-config-infra'
benchserver2.acme.com openshift_node_group_name='node-config-compute'
#
EXTRA INFORMATION GOES HERE
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 21 (7 by maintainers)
All, thank you for the detailed failure reports. I am in the process of creating a custom module to deal with this csr signing issue here: https://github.com/openshift/openshift-ansible/pull/9711
We plan to backport to 3.10 as soon as it’s ready, hopefully in the next day or so.
@kmurthy1 thanks for sharing that workaround to disable the
fail_on_timeout. It worked for me to progress beyond this issue. @aland-zhang I suggest you try switching the boolean to false and having another go.