openshift-ansible: FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created

Description

On a single master, i am getting below error while installing through ansible

Version
  • Your ansible version per ansible --version

ansible --version ansible 2.6.5 config file = /etc/ansible/ansible.cfg configured module search path = [u’/root/.ansible/plugins/modules’, u’/usr/share/ansible/plugins/modules’] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Jul 13 2018, 13:06:57) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

If you’re operating from a git clone:

  • The output of git describe

git describe openshift-ansible-3.11.23-1-2-g2892025

Steps To Reproduce
  1. ansible-playbook -i inventory.ini openshift-ansible/playbooks/deploy_cluster.yml
Expected Results

Describe what you expected to happen. OpenShift cluster should get install

Observed Results

Describe what is actually happening.

FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (2 retries left).Result was: {
    "attempts": 29,
    "changed": true,
    "cmd": [
        "oc",
        "get",
        "crd",
        "servicemonitors.monitoring.coreos.com",
        "-n",
        "openshift-monitoring",
        "--config=/tmp/openshift-cluster-monitoring-ansible-AOOcA3/admin.kubeconfig"
    ],
    "delta": "0:00:00.251324",
    "end": "2018-10-17 12:43:50.411317",
    "invocation": {
        "module_args": {
            "_raw_params": "oc get crd servicemonitors.monitoring.coreos.com -n openshift-monitoring --config=/tmp/openshift-cluster-monitoring-ansible-AOOcA3/admin.kubeconfig",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 1,
    "retries": 31,
    "start": "2018-10-17 12:43:50.159993",
    "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found",
    "stderr_lines": [
        "No resources found.",
        "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"
    ],
    "stdout": "",
    "stdout_lines": []
}
Additional Information

Provide any additional information which may help us diagnose the issue.

  • Your operating system and version, ie: RHEL 7.2, Fedora 23 ($ cat /etc/redhat-release) CentOS Linux release 7.5.1804 (Core)

  • Your inventory file (especially any non-standard configuration parameters)

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin

# localhost likely doesn't meet the minimum requirements
openshift_disable_check=disk_availability,memory_availability
openshift_additional_repos=[{'id': 'centos-okd-ci', 'name': 'centos-okd-ci', 'baseurl' :'https://rpms.svc.ci.openshift.org/openshift-origin-v3.11', 'gpgcheck' :'0', 'enabled' :'1'}
]

openshift_public_hostname=console.10.0.2.15.nip.io
openshift_master_default_subdomain=apps.10.0.2.15.nip.io
openshift_master_api_port=8443
openshift_master_console_port=8443

[masters]
c1-ocp openshift_ip=10.0.2.15 openshift_schedulable=true

[etcd]
c1-ocp openshift_ip=10.0.2.15

[nodes]
c1-ocp openshift_ip=10.0.2.15 openshift_schedulable=true openshift_node_group_name="node-config-all-in-one"

EXTRA INFORMATION GOES HERE

output of tail -f /var/log/messages is below

https://gist.github.com/imranrazakhan/fa69035bdad111a27dc354e2fc44ec50

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 25 (1 by maintainers)

Most upvoted comments

Thanks @ArturoArreola

Deleting everything in /etc/cni/net.d on all the masters/nodes, rebooting, and re-ran installation worked for me. Note: I had three calico files in there from a previous installation attempt using calico. It also did not have 80-openshift-network.conf in that directory.

I have the same issue, but what I did was…

  • Add NM_CONTROLLED=yes to ifcfg-eth0 to all my nodes
  • Verify my pods with $oc get pods --all-namespaces
    • $oc describe [pod cluster-monitoring-operator-WXYZ-ASDF] -n openshift-monitoring ==> With this command, in last part I could see reason with my pod didn’t initiate, I have this message… Warning FailedCreatePodSandBox 1h kubelet, infra-openshift-nuuptech Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container “70719b9ee2bb9c54fc1d866a6134b229b3c1c151148c9558ea0a4ef8cb66526a” network for pod “cluster-monitoring-operator-67579f5cb5-gxmwc”: NetworkPlugin cni failed to set up pod “cluster-monitoring-operator-67579f5cb5-gxmwc_openshift-monitoring” network: failed to find plugin “bridge” in path [/opt/cni/bin], failed to clean up sandbox container “70719b9ee2bb9c54fc1d866a6134b229b3c1c151148c9558ea0a4ef8cb66526a” network for pod “cluster-monitoring-operator-67579f5cb5-gxmwc”: NetworkPlugin cni failed to teardown pod “cluster-monitoring-operator-67579f5cb5-gxmwc_openshift-monitoring” network: failed to find plugin “bridge” in path [/opt/cni/bin]]

I searched what is in bold, and I find a next solution…

  • $ls -l /etc/cni/net.d ==> Normally the only file should be 80-openshift-network.conf, and I had 3 files
  • $ ls -l /etc/cni/net.d -rw-r–r–. 1 root root 294 Mar 12 16:46 100-crio-bridge.conf -rw-r–r–. 1 root root 54 Mar 12 16:46 200-loopback.conf -rw-r–r–. 1 root root 83 May 15 16:15 80-openshift-network.conf

Red Hat suggest delete extra files and only keep 80-openshift-network.conf, but I only move 100-crio-bridge.conf and 200-loopback.conf to other directory. After do that, I reboot all my nodes, and in master node I execute playbooks/openshift-monitoring/config.yml again and it worked.

@ Try restarting the Docker service. sudo systemctl restart docker