openshift-ansible: Timed out accepting certificate signing requests.

Description

Provide a brief description of your issue here. For example:

A Openshift 3.10 cluster installation fails when attempting to accept certificate signing requests. The oc_adm_csr.py times out after 60 seconds. 4 certificates needed to be signed. They all PASSED. But took 67 seconds to complete.

Version
# ansible --version
ansible 2.4.4.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May 31 2018, 09:41:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

If you’re operating from a git clone:

release-3.10 branch

$ git describe 
openshift-ansible-3.10.27-2-69-gd96b19f2a
Steps To Reproduce
  1. Configure inventory using 1 master and 1 node on bare metal.
  2. Run the cluster install script.
Expected Results

The installation completes successfully.

Observed Results
"Timed out accepting certificate signing requests. Failing as requested."
INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:00:09)
Health Check                : Complete (0:02:48)
Node Bootstrap Preparation  : Complete (0:00:01)
etcd Install                : Complete (0:00:22)
Master Install              : Complete (0:01:29)
Master Additional Install   : Complete (0:00:48)
Node Join                   : In Progress (0:01:10)

Failure summary:
  1. Hosts:    benchserver7.acme.com
     Play:     Approve any pending CSR requests from inventory nodes
     Task:     Report approval errors
     Message:  Node approval failed

For long output or logs, consider using a gist

Detailed -vvv logging of Ansible script. All the gory details are here.

Additional Information

Provide any additional information which may help us diagnose the issue.

# uname -a
Linux benchserver7 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/ansible/hosts 
# This is the default ansible 'hosts' file.

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
containerized=false
openshift_deployment_type=openshift-enterprise
debug_level=0
openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true']}, {'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/infra=true',]}, {'name': 'node-config-compute', 'labels': ['node-role.kubernetes.io/compute=true'], 'edits': [{ 'key': 'kubeletArguments.pods-per-core','value': ['20']}]}]
openshift_master_cluster_hostname=benchserver7
ansible_ssh_user=root
openshift_enable_service_catalog=false
disk_availability=false
openshift_disable_check=memory_availability,disk_availability

[masters]
benchserver7.acme.com

[etcd]
benchserver7.acme.com

[nodes]
benchserver7.acme.com openshift_node_group_name='node-config-master'
#benchserver5.acme.com openshift_node_group_name='node-config-infra'
benchserver2.acme.com openshift_node_group_name='node-config-compute'
#
EXTRA INFORMATION GOES HERE

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (7 by maintainers)

Most upvoted comments

All, thank you for the detailed failure reports. I am in the process of creating a custom module to deal with this csr signing issue here: https://github.com/openshift/openshift-ansible/pull/9711

We plan to backport to 3.10 as soon as it’s ready, hopefully in the next day or so.

@kmurthy1 thanks for sharing that workaround to disable the fail_on_timeout. It worked for me to progress beyond this issue. @aland-zhang I suggest you try switching the boolean to false and having another go.