openshift-ansible: Install OpenShift 3.11 get error: Could not find csr for nodes
Description
Provide a brief description of your issue here. For example: Installed OpenShift 3.1.1 into Redhat 7.6 and got error: Could find csr for nodes
On a multi master install, if the first master goes down we can no: N/A only 1 master longer scaleup the cluster with new nodes or masters: N/A
Version
Please put the following version information in the code block indicated below.
- Your ansible version per
ansible --versionansible 2.6.14 config file = /usr/share/ansible/openshift-ansible/ansible.cfg configured module search path = [u’/root/.ansible/plugins/modules’, u’/usr/share/ansible/plugins/modules’] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
If you’re running from playbooks installed via RPM
- The output of
rpm -q openshift-ansibleansible-2.6.14-1.el7ae.noarch Place the output between the code block below:
VERSION INFORMATION HERE PLEASE
Steps To Reproduce
Step1: Prepare VMs: I have 4 Redhat7.6 VMs Followed the doc https://docs.openshift.com/container-platform/3.11/install/index.html to install setup hosts, Inventory File (/etc/ansible/hosts):
Create an OSEv3 group that contains the masters, nodes,
[OSEv3:children] masters nodes etcd
Set variables common for all OSEv3 hosts
[OSEv3:vars] os_firewall_use_firewalld=True
SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
If ansible_ssh_user is not root, ansible_become must be set to true
ansible_become=false openshift_master_default_subdomain=apps.fyre.ibm.com openshift_deployment_type=openshift-enterprise oreg_url=registry.redhat.io/openshift3/ose-${component}😒{version} oreg_auth_user=<my user name here> oreg_auth_password=xxxxxxxxxxxxxxxxxxxxxx
uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
#openshift_master_identity_providers=[{‘name’: ‘htpasswd_auth’, ‘login’: ‘true’, ‘challenge’: ‘true’, ‘kind’: ‘HTPasswdPasswordIdentityProvider’}]
host group for masters
[masters] scaorh-master.fyre.ibm.com
host group for etcd
[etcd] scaorh-master.fyre.ibm.com
host group for nodes, includes region info
[nodes] scaorh-master.fyre.ibm.com openshift_node_group_name=‘node-config-master’ scaorh-worker1.fyre.ibm.com openshift_node_group_name=‘node-config-compute’ scaorh1-worker2.fyre.ibm.com openshift_node_group_name=‘node-config-compute’ scaorh2-infranode.fyre.ibm.com openshift_node_group_name=‘node-config-infra’
Step 2: deploy: cd /usr/share/ansible/openshift-ansible Run: ansible-playbook -i /etc/ansible/hosts playbooks/prerequisites.yml
ansible-playbook -i /etc/ansible/hosts playbooks/deploy_cluster.yml Got error: TASK [Approve node certificates when bootstrapping] *********************************************************** Sunday 17 March 2019 12:36:15 -0700 (0:00:00.137) 0:30:15.928 ********** FAILED - RETRYING: Approve node certificates when bootstrapping (30 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (29 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (28 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (27 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (26 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (25 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (24 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (23 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (22 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (21 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (20 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (19 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (18 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (17 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (16 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (15 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (14 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (13 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (12 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (11 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (10 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (9 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (8 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (7 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (6 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (5 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (4 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (3 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (2 retries left). FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left). fatal: [scaorh-master.fyre.ibm.com]: FAILED! => {“all_subjects_found”: [“subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-master.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh1-worker2.fyre.ibm.com\n”, “subject=/O=system:nodes/CN=system:node:scaorh-worker1.fyre.ibm.com\n”], “attempts”: 30, “changed”: false, “client_approve_results”: [], “client_csrs”: {“node-csr-8e-uSNcl4xSbMe02CoIcaelY5mjC1eqCIXaXEu4Vjco”: “scaorh1-worker2.fyre.ibm.com”, “node-csr-J-1_iIVS5-hgaQz5xGifBwWTf5l4CcXgvOzvKs7yufU”: “scaorh-worker1.fyre.ibm.com”}, “msg”: “Could not find csr for nodes: scaorh2-infranode.fyre.ibm.com”, “oc_get_nodes”: {“apiVersion”: “v1”, “items”: [{“apiVersion”: “v1”, “kind”: “Node”, “metadata”: {“annotations”: {“node.openshift.io/md5sum”: “6ada87691866d0068b8c8cfe0df773b2”, “volumes.kubernetes.io/controller-managed-attach-detach”: “true”}, “creationTimestamp”: “2019-03-17T19:26:30Z”, “labels”: {“beta.kubernetes.io/arch”: “amd64”, “beta.kubernetes.io/os”: “linux”, “kubernetes.io/hostname”: “scaorh-master.fyre.ibm.com”, “node-role.kubernetes.io/master”: “true”}, “name”: “scaorh-master.fyre.ibm.com”, “namespace”: “”, “resourceVersion”: “2860”, “selfLink”: “/api/v1/nodes/scaorh-master.fyre.ibm.com”, “uid”: “90c98d93-48ea-11e9-bf0d-00163e01f117”}, “spec”: {}, “status”: {“addresses”: [{“address”: “172.16.241.23”, “type”: “InternalIP”}, {“address”: “scaorh-master.fyre.ibm.com”, “type”: “Hostname”}], “allocatable”: {“cpu”: “16”, “hugepages-1Gi”: “0”, “hugepages-2Mi”: “0”, “memory”: “32676344Ki”, “pods”: “250”}, “capacity”: {“cpu”: “16”, “hugepages-1Gi”: “0”, “hugepages-2Mi”: “0”, “memory”: “32778744Ki”, “pods”: “250”}, “conditions”: [{“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has sufficient disk space available”, “reason”: “KubeletHasSufficientDisk”, “status”: “False”, “type”: “OutOfDisk”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has sufficient memory available”, “reason”: “KubeletHasSufficientMemory”, “status”: “False”, “type”: “MemoryPressure”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has no disk pressure”, “reason”: “KubeletHasNoDiskPressure”, “status”: “False”, “type”: “DiskPressure”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “kubelet has sufficient PID available”, “reason”: “KubeletHasSufficientPID”, “status”: “False”, “type”: “PIDPressure”}, {“lastHeartbeatTime”: “2019-03-17T19:39:14Z”, “lastTransitionTime”: “2019-03-17T19:26:30Z”, “message”: “runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized”, “reason”: “KubeletNotReady”, “status”: “False”, “type”: “Ready”}], “daemonEndpoints”: {“kubeletEndpoint”: {“Port”: 10250}}, “images”: [{“names”: [“registry.redhat.io/openshift3/ose-node@sha256:8d28f961c74f033b3df9ed0d7a2a1bfb5e6ebb0611cb6b018f7e623961f7ea52”, “registry.redhat.io/openshift3/ose-node:v3.11”], “sizeBytes”: 1171108452}, {“names”: [“registry.redhat.io/openshift3/ose-control-plane@sha256:200a14df0fdf3c467588f5067ab015cd316e49856114ba7602d4ca9e5f42b0f3”, “registry.redhat.io/openshift3/ose-control-plane:v3.11”], “sizeBytes”: 808610884}, {“names”: [“registry.redhat.io/rhel7/etcd@sha256:be1c3e3f002ac41c35f2994f1c0cb3bd28a8ff59674941ca1a6223a8b72c2758”, “registry.redhat.io/rhel7/etcd:3.2.22”], “sizeBytes”: 259048769}, {“names”: [“registry.redhat.io/openshift3/ose-pod@sha256:f27c68d225803ca3a97149083b5211ccc3def3230f8147fd017eef5b11d866d5”, “registry.redhat.io/openshift3/ose-pod:v3.11”, “registry.redhat.io/openshift3/ose-pod:v3.11.88”], “sizeBytes”: 238366131}], “nodeInfo”: {“architecture”: “amd64”, “bootID”: “bdeaf185-56b0-4cff-b344-2fe95351d324”, “containerRuntimeVersion”: “docker://1.13.1”, “kernelVersion”: “3.10.0-957.5.1.el7.x86_64”, “kubeProxyVersion”: “v1.11.0+d4cacc0”, “kubeletVersion”: “v1.11.0+d4cacc0”, “machineID”: “cbb00030e5204543a0474ffff17ec26f”, “operatingSystem”: “linux”, “osImage”: “OpenShift Enterprise”, “systemUUID”: “E21E048B-6EB8-4685-A3EA-57F5CF1F2BF3”}}}], “kind”: “List”, “metadata”: {“resourceVersion”: “”, “selfLink”: “”}}, “raw_failures”: [], “rc”: 0, “server_approve_results”: [], “server_csrs”: null, “state”: “unknown”, “unwanted_csrs”: [{“apiVersion”: “certificates.k8s.io/v1beta1”, “kind”: “CertificateSigningRequest”, “metadata”: {“creationTimestamp”: “2019-03-17T19:36:13Z”, “generateName”: “csr-”, “name”: “csr-58dj9”, “namespace”: “”, “resourceVersion”: “2555”, “selfLink”: “/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-58dj9”, “uid”: “ecbad18b-48eb-11e9-bf0d-00163e01f117”}, “spec”: {“groups”: [“system:nodes”, “system:authenticated”], “request”: “LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQlR6Q0I5Z0lCQURCSU1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14THpBdEJnTlZCQU1USm5ONQpjM1JsYlRwdWIyUmxPbk5qWVc5eWFDMXRZWE4wWlhJdVpubHlaUzVwWW0wdVkyOXRNRmt3RXdZSEtvWkl6ajBDCkFRWUlLb1pJemowREFRY0RRZ0FFS1VZbGZFai9WUlFQL09ETFpORDFMYXh4VnNGc0RaSllTeDBkOGdEUityWVcKaC9rUUhFL0QvVHE4SHIwOENRT2pQaGlkbHFGWkZjcExkQlpMSVdQcWdLQk1NRW9HQ1NxR1NJYjNEUUVKRGpFOQpNRHN3T1FZRFZSMFJCREl3TUlJYWMyTmhiM0pvTFcxaGMzUmxjaTVtZVhKbExtbGliUzVqYjIyQ0FJY0VyQkR4CkY0Y0VDUjdDbzRjRXJCRUFBVEFLQmdncWhrak9QUVFEQWdOSUFEQkZBaUJMRmVrbmRjVm4zSGlYNGVwN0ZOMi8KTi9WYm5VbXlINmhTb1VOUFowTWE1Z0loQU5zdGU4QUNSR1BnWGNIS3YzT0g3cnNEWk92N1FuVm5XOFNOUWZUTwpzMm9rCi0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=”, “usages”: [“digital signature”, “key encipherment”, “server auth”], “username”: “system:node:scaorh-master.fyre.ibm.com”}, “status”: {}}, {“apiVersion”: “certificates.k8s.io/v1beta1”, “kind”: “CertificateSigningRequest”, “metadata”: {“creationTimestamp”: “2019-03-17T19:26:52Z”, “generateName”: “csr-”, “name”: “csr-lzvjj”, “namespace”: “”, “resourceVersion”: “949”, “selfLink”: “/apis/certificates.k8s.io/v1beta1/certificatesigningrequests/csr-lzvjj”, “uid”: “9e264342-48ea-11e9-bf0d-00163e01f117”}, “spec”: {“groups”: [“system:masters”, “system:cluster-admins”, “system:authenticated”], “request”: “LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkJEQ0JxZ0lCQURCSU1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14THpBdEJnTlZCQU1USm5ONQpjM1JsYlRwdWIyUmxPbk5qWVc5eWFDMXRZWE4wWlhJdVpubHlaUzVwWW0wdVkyOXRNRmt3RXdZSEtvWkl6ajBDCkFRWUlLb1pJemowREFRY0RRZ0FFdm1CRmppdm9qMlBkWDJyRmM0eE5rVERSYjROclVWSGRCRDFNRk50OHV2L1AKdTZ3aUdVbTZpdTRqOVdrb2Y1TS9LOUE2eGRBdVRlUzU2WkRRaEdNSllxQUFNQW9HQ0NxR1NNNDlCQU1DQTBrQQpNRVlDSVFDS3o4dVBqcSt0ZzJwNkNxdC9NZks0OGQ2cjFFWUNEeHRhcmFjMlRpN3I1QUloQU4yeUY2QVlUcU5LCmhNVlJKSTJIMzIxVWN0R08zRi9wbTltL1IreDhYMTFuCi0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=”, “usages”: [“digital signature”, “key encipherment”, “client auth”], “username”: “system:admin”}, “status”: {“certificate”: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNoVENDQVcyZ0F3SUJBZ0lVSEFMc0FQQXNlYXFmUUhpUytMM2hIWHFEWDZzd0RRWUpLb1pJaHZjTkFRRUwKQlFBd PLAY RECAP **************************************************************************************************** localhost : ok=11 changed=0 unreachable=0 failed=0 scaorh-master.fyre.ibm.com : ok=487 changed=238 unreachable=0 failed=1 scaorh-worker1.fyre.ibm.com : ok=109 changed=66 unreachable=0 failed=0 scaorh1-worker2.fyre.ibm.com : ok=109 changed=66 unreachable=0 failed=0 scaorh2-infranode.fyre.ibm.com : ok=101 changed=19 unreachable=0 failed=0
INSTALLER STATUS ********************************************************************************************** Initialization : Complete (0:00:25) Health Check : Complete (0:00:55) Node Bootstrap Preparation : Complete (0:13:04) etcd Install : Complete (0:02:25) Master Install : Complete (0:07:07) Master Additional Install : Complete (0:06:11) Node Join : In Progress (0:03:06) This phase can be restarted by running: playbooks/openshift-node/join.yml Sunday 17 March 2019 12:39:16 -0700 (0:03:01.349) 0:33:17.277 **********
cockpit : Install cockpit-ws ------------------------------------------------------------------------- 316.13s openshift_node : install needed rpm(s) --------------------------------------------------------------- 237.61s Approve node certificates when bootstrapping --------------------------------------------------------- 181.35s openshift_node : Install iSCSI storage plugin dependencies ------------------------------------------- 120.08s openshift_node : Install node, clients, and conntrack packages --------------------------------------- 103.55s etcd : Install etcd ----------------------------------------------------------------------------------- 83.24s openshift_control_plane : Wait for all control plane pods to become ready ----------------------------- 70.09s Run health checks (install) - EL ---------------------------------------------------------------------- 54.79s openshift_control_plane : Wait for control plane pods to appear --------------------------------------- 54.14s openshift_node : Install Ceph storage plugin dependencies --------------------------------------------- 47.59s openshift_node : Install dnsmasq ---------------------------------------------------------------------- 46.75s openshift_ca : Install the base package for admin tooling --------------------------------------------- 45.79s openshift_node : Install GlusterFS storage plugin dependencies ---------------------------------------- 43.07s openshift_excluder : Install openshift excluder - yum ------------------------------------------------- 39.41s openshift_excluder : Install docker excluder - yum ---------------------------------------------------- 24.91s openshift_cli : Install clients ----------------------------------------------------------------------- 24.76s openshift_node_group : Wait for the sync daemonset to become ready and available ---------------------- 11.54s openshift_manageiq : Configure role/user permissions -------------------------------------------------- 10.10s nickhammond.logrotate : nickhammond.logrotate | Install logrotate -------------------------------------- 9.12s openshift_node : Install NFS storage plugin dependencies ----------------------------------------------- 8.84s
Failure summary:
- Hosts: scaorh-master.fyre.ibm.com Play: Approve any pending CSR requests from inventory nodes Task: Approve node certificates when bootstrapping Message: Could not find csr for nodes: scaorh2-infranode.fyre.ibm.com
Expected Results
Describe what you expected to happen.
Example command and output or error messages
Observed Results
Describe what is actually happening.
Example command and output or error messages
For long output or logs, consider using a gist
Additional Information
Provide any additional information which may help us diagnose the issue.
- Your operating system and version, ie: RHEL 7.2, Fedora 23 (
$ cat /etc/redhat-release) - Your inventory file (especially any non-standard configuration parameters)
- Sample code, etc Red Hat Enterprise Linux Server release 7.6 (Maipo)
EXTRA INFORMATION GOES HERE
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 9
- Comments: 15 (1 by maintainers)
I had the same issue running on OpenStack, but fixed by making sure that my configured hostnames matched exactly the inventory file.
Before the change the DNS was pointing correctly to node1.example.com but hostname was something like node1.novalocal. Fixed the hostnames and rebooted the nodes and playbook went through ok.
This answer was the fix for the issue. In simple make sure to validate following points to avoid this issue,
Hope it helps 😉
This happens to us if there is a failed install later in the deploy_cluster.yaml playbook, due to some other issue. The CSRs are approved initially and, if we re-run the deploy quick enough, it’s fine. But if we wait too long the approved CSRs disappear and now the deploy won’t get past “Approve node certificates when bootstrapping”.
WORKAROUND: edit whichever pb is running this task (in my case, it was openshift-ansible/playbooks/openshift-node/private/join.yml) and add “tags: csr” to the “Approve node…” task. Then re-run the deploy with --skip-tags=csr.
I’m thinking a redeploy of the certificates might also be a workaround.