openshift-ansible: Install OpenShift 3.11 get error: Could not find csr for nodes

Description

Provide a brief description of your issue here. For example: Installed OpenShift 3.1.1 into Redhat 7.6 and got error: Could find csr for nodes

On a multi master install, if the first master goes down we can no: N/A only 1 master longer scaleup the cluster with new nodes or masters: N/A

Version

Please put the following version information in the code block indicated below.

  • Your ansible version per ansible --version ansible 2.6.14 config file = /usr/share/ansible/openshift-ansible/ansible.cfg configured module search path = [u’/root/.ansible/plugins/modules’, u’/usr/share/ansible/plugins/modules’] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

If you’re running from playbooks installed via RPM

  • The output of rpm -q openshift-ansible ansible-2.6.14-1.el7ae.noarch Place the output between the code block below:
VERSION INFORMATION HERE PLEASE

Steps To Reproduce

Step1: Prepare VMs: I have 4 Redhat7.6 VMs Followed the doc https://docs.openshift.com/container-platform/3.11/install/index.html to install setup hosts, Inventory File (/etc/ansible/hosts):

Create an OSEv3 group that contains the masters, nodes,

[OSEv3:children] masters nodes etcd

Set variables common for all OSEv3 hosts

[OSEv3:vars] os_firewall_use_firewalld=True

SSH user, this user should allow ssh based auth without requiring a password

ansible_ssh_user=root

If ansible_ssh_user is not root, ansible_become must be set to true

ansible_become=false openshift_master_default_subdomain=apps.fyre.ibm.com openshift_deployment_type=openshift-enterprise oreg_url=registry.redhat.io/openshift3/ose-${component}😒{version} oreg_auth_user=<my user name here> oreg_auth_password=xxxxxxxxxxxxxxxxxxxxxx

uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider

#openshift_master_identity_providers=[{‘name’: ‘htpasswd_auth’, ‘login’: ‘true’, ‘challenge’: ‘true’, ‘kind’: ‘HTPasswdPasswordIdentityProvider’}]

host group for masters

[masters] scaorh-master.fyre.ibm.com

host group for etcd

[etcd] scaorh-master.fyre.ibm.com

host group for nodes, includes region info

[nodes] scaorh-master.fyre.ibm.com openshift_node_group_name=‘node-config-master’ scaorh-worker1.fyre.ibm.com openshift_node_group_name=‘node-config-compute’ scaorh1-worker2.fyre.ibm.com openshift_node_group_name=‘node-config-compute’ scaorh2-infranode.fyre.ibm.com openshift_node_group_name=‘node-config-infra’

Step 2: deploy: cd /usr/share/ansible/openshift-ansible Run: ansible-playbook -i /etc/ansible/hosts playbooks/prerequisites.yml

ansible-playbook -i /etc/ansible/hosts playbooks/deploy_cluster.yml Got error: TASK [Approve node certificates when bootstrapping] *********************************************************** Sunday 17 March 2019 12:36:15 -0700 (0:00:00.137) 0:30:15.928 ********** FAILED - RETRYING: Approve node certificates when bootstrapping (30 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (29 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (28 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (27 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (26 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (25 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (24 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (23 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (22 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (21 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (20 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (19 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (18 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (17 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (16 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (15 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (14 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (13 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (12 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (11 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (10 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (9 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (8 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (7 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (6 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (5 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (4 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (3 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (2 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left). fatal: [scaorh-master.fyre.ibm.com]: FAILED! => {“all_subjects_found”: [“subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh1-worker2.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-worker1.fyre.ibm.com\n”], “attempts”: 30, “changed”: false, “client_approve_results”: [], “client_csrs”: {“node-csr-8e-uSNcl4xSbMe02CoIcaelY5mjC1eqCIXaXEu4Vjco”: “scaorh1-worker2.fyre.ibm.com”, “node-csr-J-1_iIVS5-hgaQz5xGifBwWTf5l4CcXgvOzvKs7yufU”: “scaorh-worker1.fyre.ibm.com”}, “msg”: “Could not find csr for nodes: scaorh2-infranode.fyre.ibm.com”, “oc_get_nodes”: {“apiVersion”: “v1”, “items”: [{“apiVersion”: “v1”, “kind”: “Node”, “metadata”: {“annotations”: {“node.openshift.io/md5sum”: “6ada87691866d0068b8c8cfe0df773b2”, “volumes.kubernetes.io/controller-managed-attach-detach”: “true”}, “creationTimestamp”: “2019-03-17T19:26:30Z”, “labels”: {“beta.kubernetes.io/arch”: “amd64”, “beta.kubernetes.io/os”: “linux”, “kubernetes.io/hostname”: “scaorh-master.fyre.ibm.com”, “node-role.kubernetes.io/master”: “true”}, “name”: “scaorh-master.fyre.ibm.com”, “namespace”: “”, “resourceVersion”: “2860”, “selfLink”: “/api/v1/nodes/scaorh-master.fyre.ibm.com”, “uid”: “90c98d93-48ea-11e9-bf0d-00163e01f117”}, “spec”: {}, “status”: {“addresses”: [{“address”: “172.16.241.23”, “type”: “InternalIP”}, {“address”: “scaorh-master.fyre.ibm.com”, “type”: “Hostname”}], “allocatable”: {“cpu”: “16”, “hugepages-1Gi”: “0”, “hugepages-2Mi”: “0”, “memory”: “32676344Ki”, “pods”: “250”}, “capacity”: {“cpu”: “16”, “hugepages-1Gi”: “0”, “hugepages-2Mi”: “0”, “memory”: “32778744Ki”, “pods”: “250”}, “conditions”: [{“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has sufficient disk space available”, “reason”: “KubeletHasSufficientDisk”, “status”: “False”, “type”: “OutOfDisk”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has sufficient memory available”, “reason”: “KubeletHasSufficientMemory”, “status”: “False”, “type”: “MemoryPressure”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has no disk pressure”, “reason”: “KubeletHasNoDiskPressure”, “status”: “False”, “type”: “DiskPressure”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has sufficient PID available”, “reason”: “KubeletHasSufficientPID”, “status”: “False”, “type”: “PIDPressure”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized”, “reason”: “KubeletNotReady”, “status”: “False”, “type”: “Ready”}], “daemonEndpoints”: {“kubeletEndpoint”: {“Port”: 10250}}, “images”: [{“names”: [“registry.redhat.io/openshift3/ose-node@sha256:8d28f961c74f033b3df9ed0d7a2a1bfb5e6ebb0611cb6b018f7e623961f7ea52”, “registry.redhat.io/openshift3/ose-node:v3.11”], “sizeBytes”: 1171108452}, {“names”: [“registry.redhat.io/openshift3/ose-control-plane@sha256:200a14df0fdf3c467588f5067ab015cd316e49856114ba7602d4ca9e5f42b0f3”, “registry.redhat.io/openshift3/ose-control-plane:v3.11”], “sizeBytes”: 808610884}, {“names”: [“registry.redhat.io/rhel7/etcd@sha256:be1c3e3f002ac41c35f2994f1c0cb3bd28a8ff59674941ca1a6223a8b72c2758”, “registry.redhat.io/rhel7/etcd:3.2.22”], “sizeBytes”: 259048769}, {“names”: [“registry.redhat.io/openshift3/ose-pod@sha256:f27c68d225803ca3a97149083b5211ccc3def3230f8147fd017eef5b11d866d5”, “registry.redhat.io/openshift3/ose-pod:v3.11”, “registry.redhat.io/openshift3/ose-pod:v3.11.88”], “sizeBytes”: 238366131}], “nodeInfo”: {“architecture”: “amd64”, “bootID”: “bdeaf185-56b0-4cff-b344-2fe95351d324”, “containerRuntimeVersion”: “docker://1.13.1”, “kernelVersion”: “3.10.0-957.5.1.el7.x86_64”, “kubeProxyVersion”: “v1.11.0+d4cacc0”, “kubeletVersion”: “v1.11.0+d4cacc0”, “machineID”: “cbb00030e5204543a0474ffff17ec26f”, “operatingSystem”: “linux”, “osImage”: “OpenShift Enterprise”, “systemUUID”: “E21E048B-6EB8-4685-A3EA-57F5CF1F2BF3”}}}], “kind”: “List”, “metadata”: {“resourceVersion”: “”, “selfLink”: “”}}, “raw_failures”: [], “rc”: 0, “server_approve_results”: [], “server_csrs”: null, “state”: “unknown”, “unwanted_csrs”: [{“apiVersion”: “certificates.k8s.io/v1beta1”, “kind”: “CertificateSigningRequest”, “metadata”: {“creationTimestamp”: “2019-03-17T19:36:13Z”, “generateName”: “csr-”, “name”: “csr-58dj9”, “namespace”: “”, “resourceVersion”: “2555”, “selfLink”: “/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-58dj9”, “uid”: “ecbad18b-48eb-11e9-bf0d-00163e01f117”}, “spec”: {“groups”: [“system:nodes”, “system:authenticated”], “request”: “LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQlR6Q0I5Z0lCQURCSU1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14THpBdEJnTlZCQU1USm5ONQpjM1JsYlRwdWIyUmxPbk5qWVc5eWFDMXRZWE4wWlhJdVpubHlaUzVwWW0wdVkyOXRNRmt3RXdZSEtvWkl6ajBDCkFRWUlLb1pJemowREFRY0RRZ0FFS1VZbGZFai9WUlFQL09ETFpORDFMYXh4VnNGc0RaSllTeDBkOGdEUityWVcKaC9rUUhFL0QvVHE4SHIwOENRT2pQaGlkbHFGWkZjcExkQlpMSVdQcWdLQk1NRW9HQ1NxR1NJYjNEUUVKRGpFOQpNRHN3T1FZRFZSMFJCREl3TUlJYWMyTmhiM0pvTFcxaGMzUmxjaTVtZVhKbExtbGliUzVqYjIyQ0FJY0VyQkR4CkY0Y0VDUjdDbzRjRXJCRUFBVEFLQmdncWhrak9QUVFEQWdOSUFEQkZBaUJMRmVrbmRjVm4zSGlYNGVwN0ZOMi8KTi9WYm5VbXlINmhTb1VOUFowTWE1Z0loQU5zdGU4QUNSR1BnWGNIS3YzT0g3cnNEWk92N1FuVm5XOFNOUWZUTwpzMm9rCi0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=”, “usages”: [“digital signature”, “key encipherment”, “server auth”], “username”: “system:node:scaorh-master.fyre.ibm.com”}, “status”: {}}, {“apiVersion”: “certificates.k8s.io/v1beta1”, “kind”: “CertificateSigningRequest”, “metadata”: {“creationTimestamp”: “2019-03-17T19:26:52Z”, “generateName”: “csr-”, “name”: “csr-lzvjj”, “namespace”: “”, “resourceVersion”: “949”, “selfLink”: “/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-lzvjj”, “uid”: “9e264342-48ea-11e9-bf0d-00163e01f117”}, “spec”: {“groups”: [“system:masters”, “system:cluster-admins”, “system:authenticated”], “request”: “LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkJEQ0JxZ0lCQURCSU1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14THpBdEJnTlZCQU1USm5ONQpjM1JsYlRwdWIyUmxPbk5qWVc5eWFDMXRZWE4wWlhJdVpubHlaUzVwWW0wdVkyOXRNRmt3RXdZSEtvWkl6ajBDCkFRWUlLb1pJemowREFRY0RRZ0FFdm1CRmppdm9qMlBkWDJyRmM0eE5rVERSYjROclVWSGRCRDFNRk50OHV2L1AKdTZ3aUdVbTZpdTRqOVdrb2Y1TS9LOUE2eGRBdVRlUzU2WkRRaEdNSllxQUFNQW9HQ0NxR1NNNDlCQU1DQTBrQQpNRVlDSVFDS3o4dVBqcSt0ZzJwNkNxdC9NZks0OGQ2cjFFWUNEeHRhcmFjMlRpN3I1QUloQU4yeUY2QVlUcU5LCmhNVlJKSTJIMzIxVWN0R08zRi9wbTltL1IreDhYMTFuCi0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=”, “usages”: [“digital signature”, “key encipherment”, “client auth”], “username”: “system:admin”}, “status”: {“certificate”: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNoVENDQVcyZ0F3SUJBZ0lVSEFMc0FQQXNlYXFmUUhpUytMM2hIWHFEWDZzd0RRWUpLb1pJaHZjTkFRRUwKQlFBd PLAY RECAP **************************************************************************************************** localhost : ok=11 changed=0 unreachable=0 failed=0 scaorh-master.fyre.ibm.com : ok=487 changed=238 unreachable=0 failed=1 scaorh-worker1.fyre.ibm.com : ok=109 changed=66 unreachable=0 failed=0 scaorh1-worker2.fyre.ibm.com : ok=109 changed=66 unreachable=0 failed=0 scaorh2-infranode.fyre.ibm.com : ok=101 changed=19 unreachable=0 failed=0

INSTALLER STATUS ********************************************************************************************** Initialization : Complete (0:00:25) Health Check : Complete (0:00:55) Node Bootstrap Preparation : Complete (0:13:04) etcd Install : Complete (0:02:25) Master Install : Complete (0:07:07) Master Additional Install : Complete (0:06:11) Node Join : In Progress (0:03:06) This phase can be restarted by running: playbooks/openshift-node/join.yml Sunday 17 March 2019 12:39:16 -0700 (0:03:01.349) 0:33:17.277 **********

cockpit : Install cockpit-ws ------------------------------------------------------------------------- 316.13s openshift_node : install needed rpm(s) --------------------------------------------------------------- 237.61s Approve node certificates when bootstrapping --------------------------------------------------------- 181.35s openshift_node : Install iSCSI storage plugin dependencies ------------------------------------------- 120.08s openshift_node : Install node, clients, and conntrack packages --------------------------------------- 103.55s etcd : Install etcd ----------------------------------------------------------------------------------- 83.24s openshift_control_plane : Wait for all control plane pods to become ready ----------------------------- 70.09s Run health checks (install) - EL ---------------------------------------------------------------------- 54.79s openshift_control_plane : Wait for control plane pods to appear --------------------------------------- 54.14s openshift_node : Install Ceph storage plugin dependencies --------------------------------------------- 47.59s openshift_node : Install dnsmasq ---------------------------------------------------------------------- 46.75s openshift_ca : Install the base package for admin tooling --------------------------------------------- 45.79s openshift_node : Install GlusterFS storage plugin dependencies ---------------------------------------- 43.07s openshift_excluder : Install openshift excluder - yum ------------------------------------------------- 39.41s openshift_excluder : Install docker excluder - yum ---------------------------------------------------- 24.91s openshift_cli : Install clients ----------------------------------------------------------------------- 24.76s openshift_node_group : Wait for the sync daemonset to become ready and available ---------------------- 11.54s openshift_manageiq : Configure role/user permissions -------------------------------------------------- 10.10s nickhammond.logrotate : nickhammond.logrotate | Install logrotate -------------------------------------- 9.12s openshift_node : Install NFS storage plugin dependencies ----------------------------------------------- 8.84s

Failure summary:

  1. Hosts: scaorh-master.fyre.ibm.com Play: Approve any pending CSR requests from inventory nodes Task: Approve node certificates when bootstrapping Message: Could not find csr for nodes: scaorh2-infranode.fyre.ibm.com
Expected Results

Describe what you expected to happen.

Example command and output or error messages
Observed Results

Describe what is actually happening.

Example command and output or error messages

For long output or logs, consider using a gist

Additional Information

Provide any additional information which may help us diagnose the issue.

  • Your operating system and version, ie: RHEL 7.2, Fedora 23 ($ cat /etc/redhat-release)
  • Your inventory file (especially any non-standard configuration parameters)
  • Sample code, etc Red Hat Enterprise Linux Server release 7.6 (Maipo)
EXTRA INFORMATION GOES HERE

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 9
  • Comments: 15 (1 by maintainers)

Most upvoted comments

I had the same issue running on OpenStack, but fixed by making sure that my configured hostnames matched exactly the inventory file.

Before the change the DNS was pointing correctly to node1.example.com but hostname was something like node1.novalocal. Fixed the hostnames and rebooted the nodes and playbook went through ok.

I had the same issue running on OpenStack, but fixed by making sure that my configured hostnames matched exactly the inventory file.

Before the change the DNS was pointing correctly to node1.example.com but hostname was something like node1.novalocal. Fixed the hostnames and rebooted the nodes and playbook went through ok.

This answer was the fix for the issue. In simple make sure to validate following points to avoid this issue,

  1. Check the hostnames in your OKD inventory
  2. Check the host names on your VMs (on which you install OKD) E.g: hostname -A
  3. In above step 1 and 2, the host names should match. If not you will get this error. Note: In hostname -A command, all host names should match with hostnames defined in inventory.

Hope it helps 😉

This happens to us if there is a failed install later in the deploy_cluster.yaml playbook, due to some other issue. The CSRs are approved initially and, if we re-run the deploy quick enough, it’s fine. But if we wait too long the approved CSRs disappear and now the deploy won’t get past “Approve node certificates when bootstrapping”.

WORKAROUND: edit whichever pb is running this task (in my case, it was openshift-ansible/playbooks/openshift-node/private/join.yml) and add “tags: csr” to the “Approve node…” task. Then re-run the deploy with --skip-tags=csr.

I’m thinking a redeploy of the certificates might also be a workaround.