openshift-ansible: OpenShift 3.10: deploy_cluster.yml failed with Message: Control plane pods didn't come up

Description

I am trying to install OpenShift 3.10 on 3-VMs (master, worker node and infra node)

Deployment of cluster is failing with Error:

  1. Hosts: master.rkdomain.test Play: Configure masters Task: Report control plane errors Message: Control plane pods didn’t come up [root@master playbooks]#
Version
- openshift v3.10.34

- ansible --version
ansible 2.4.6.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

- openshift-ansbile
[root@master log]# rpm -qa | grep openshift-ansible
openshift-ansible-3.10.47-1.git.0.95bc2d2.el7_5.noarch
openshift-ansible-roles-3.10.47-1.git.0.95bc2d2.el7_5.noarch
openshift-ansible-playbooks-3.10.47-1.git.0.95bc2d2.el7_5.noarch
openshift-ansible-docs-3.10.47-1.git.0.95bc2d2.el7_5.noarch

Steps To Reproduce

Set up details: Having 3 VMs with RHEL 7.5 install on all (master, worker node, infra node).

  1. Using following inventory file
# Create an OSEv3 group that contains the masters, nodes, and etcd groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
#os_firewall_use_firewalld=True
openshift_disable_check=docker_image_availability

# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true

openshift_deployment_type=openshift-enterprise
#oreg_url=rkdomain.test/openshift3/ose-${component}:${version}
#openshift_myworks_modify_imagestreams=true

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]

# host group for masters
[masters]
master.rkdomain.test

# host group for etcd
[etcd]
master.rkdomain.test

# host group for nodes, includes region info
[nodes]
master.rkdomain.test openshift_node_group_name='node-config-master'
nodeone.rkdomain.test openshift_node_group_name='node-config-compute'
infranode.rkdomain.test openshift_node_group_name='node-config-infra'
  1. Executed: [root@master playbooks]# ansible-playbook -vvv prerequisites.yml Above step was successful.

[root@master playbooks]# ansible-playbook -vvv deploy_cluster.yml Failed with error

Expected Results

Installation should happen without any issues.

Observed Results
[root@master playbooks]# ansible-playbook -vvv deploy_cluster.yml

TASK [openshift_control_plane : Report control plane errors msg=Control plane pods didn't come up] ***********************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:215
fatal: [master.rkdomain.test]: FAILED! => {
    "changed": false,
    "failed": true,
    "msg": "Control plane pods didn't come up"
}

NO MORE HOSTS LEFT *******************************************************************************************************************
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP ***************************************************************************************************************************
infranode.rkdomain.test    : ok=24   changed=2    unreachable=0    failed=0
localhost                  : ok=13   changed=0    unreachable=0    failed=0
master.rkdomain.test       : ok=226  changed=34   unreachable=0    failed=1
nodeone.rkdomain.test      : ok=24   changed=2    unreachable=0    failed=0


INSTALLER STATUS *********************************************************************************************************************
Initialization              : Complete (0:00:36)
Health Check                : Complete (0:02:44)
Node Bootstrap Preparation  : Complete (0:00:02)
etcd Install                : Complete (0:01:36)
Master Install              : In Progress (0:25:27)
        This phase can be restarted by running: playbooks/openshift-master/config.yml


Failure summary:


  1. Hosts:    master.rkdomain.test
     Play:     Configure masters
     Task:     Report control plane errors
     Message:  Control plane pods didn't come up
[root@master playbooks]#

For long output or logs, consider using a gist

Additional Information

Extracts from /var/log/ansible.log file:

2018-09-22 08:15:23,114 p=30768 u=root |  Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
2018-09-22 08:15:24,001 p=30768 u=root |  FAILED - RETRYING: Wait for control plane pods to appear (5 retries left).Result was: {
    "attempts": 56, 
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "all_namespaces": null, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "field_selector": null, 
            "files": null, 
            "force": false, 
            "kind": "pod", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": "master-controllers-master.rkdomain.test", 
            "namespace": "kube-system", 
            "selector": null, 
            "state": "list"
        }
    }, 
    "msg": {
        "cmd": "/usr/bin/oc get pod master-controllers-master.rkdomain.test -o json -n kube-system", 
        "results": [
            {}
        ], 
        "returncode": 1, 
        "stderr": "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\n", 
        "stdout": ""
    }, 
    "retries": 61
}
2018-09-22 08:15:29,003 p=30768 u=root |  Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
2018-09-22 08:15:29,721 p=30768 u=root |  FAILED - RETRYING: Wait for control plane pods to appear (4 retries left).Result was: {
    "attempts": 57, 
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "all_namespaces": null, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "field_selector": null, 
            "files": null, 
            "force": false, 
            "kind": "pod", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": "master-controllers-master.rkdomain.test", 
            "namespace": "kube-system", 
            "selector": null, 
            "state": "list"
        }
    }, 
    "msg": {
        "cmd": "/usr/bin/oc get pod master-controllers-master.rkdomain.test -o json -n kube-system", 
        "results": [
            {}
        ], 
        "returncode": 1, 
        "stderr": "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\n", 
        "stdout": ""
    }, 
    "retries": 61
}
2018-09-22 08:15:34,722 p=30768 u=root |  Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
2018-09-22 08:15:35,605 p=30768 u=root |  FAILED - RETRYING: Wait for control plane pods to appear (3 retries left).Result was: {
    "attempts": 58, 
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "all_namespaces": null, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "field_selector": null, 
            "files": null, 
            "force": false, 
            "kind": "pod", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": "master-controllers-master.rkdomain.test", 
            "namespace": "kube-system", 
            "selector": null, 
            "state": "list"
        }
    }, 
    "msg": {
        "cmd": "/usr/bin/oc get pod master-controllers-master.rkdomain.test -o json -n kube-system", 
        "results": [
            {}
        ], 
        "returncode": 1, 
        "stderr": "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\n", 
        "stdout": ""
    }, 
    "retries": 61
}
2018-09-22 08:15:40,604 p=30768 u=root |  Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
2018-09-22 08:15:41,587 p=30768 u=root |  FAILED - RETRYING: Wait for control plane pods to appear (2 retries left).Result was: {
    "attempts": 59, 
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "all_namespaces": null, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "field_selector": null, 
            "files": null, 
            "force": false, 
            "kind": "pod", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": "master-controllers-master.rkdomain.test", 
            "namespace": "kube-system", 
            "selector": null, 
            "state": "list"
        }
    }, 
    "msg": {
        "cmd": "/usr/bin/oc get pod master-controllers-master.rkdomain.test -o json -n kube-system", 
        "results": [
            {}
        ], 
        "returncode": 1, 
        "stderr": "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\n", 
        "stdout": ""
    }, 
    "retries": 61
}
2018-09-22 08:15:46,591 p=30768 u=root |  Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
2018-09-22 08:15:47,673 p=30768 u=root |  FAILED - RETRYING: Wait for control plane pods to appear (1 retries left).Result was: {
    "attempts": 60, 
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "all_namespaces": null, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "field_selector": null, 
            "files": null, 
            "force": false, 
            "kind": "pod", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": "master-controllers-master.rkdomain.test", 
            "namespace": "kube-system", 
            "selector": null, 
            "state": "list"
        }
    }, 
    "msg": {
        "cmd": "/usr/bin/oc get pod master-controllers-master.rkdomain.test -o json -n kube-system", 
        "results": [
            {}
        ], 
        "returncode": 1, 
        "stderr": "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\n", 
        "stdout": ""
    }, 
    "retries": 61
}
2018-09-22 08:15:52,676 p=30768 u=root |  Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
2018-09-22 08:15:53,625 p=30768 u=root |  The full traceback is:
  File "/tmp/ansible_LcKeXE/ansible_module_oc_obj.py", line 47, in <module>
    import ruamel.yaml as yaml

2018-09-22 08:15:53,626 p=30768 u=root |  failed: [master.rkdomain.test] (item=controllers) => {
    "attempts": 60, 
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "all_namespaces": null, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "field_selector": null, 
            "files": null, 
            "force": false, 
            "kind": "pod", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": "master-controllers-master.rkdomain.test", 
            "namespace": "kube-system", 
            "selector": null, 
            "state": "list"
        }
    }, 
    "item": "controllers", 
    "msg": {
        "cmd": "/usr/bin/oc get pod master-controllers-master.rkdomain.test -o json -n kube-system", 
        "results": [
            {}
        ], 
        "returncode": 1, 
        "stderr": "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\n", 
        "stdout": ""
    }
}
2018-09-22 08:15:53,630 p=30768 u=root |  ...ignoring
2018-09-22 08:15:53,654 p=30768 u=root |  TASK [openshift_control_plane : Check status in the kube-system namespace _raw_params={{ openshift_client_binary }} status --config={{ openshift.common.config_base }}/master/admin.kubeconfig -n kube-system] ***
2018-09-22 08:15:53,654 p=30768 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:188
2018-09-22 08:15:53,743 p=30768 u=root |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
2018-09-22 08:15:54,578 p=30768 u=root |  fatal: [master.rkdomain.test]: FAILED! => {
    "changed": true, 
    "cmd": [
        "oc", 
        "status", 
        "--config=/etc/origin/master/admin.kubeconfig", 
        "-n", 
        "kube-system"
    ], 
    "delta": "0:00:00.256543", 
    "end": "2018-09-22 08:15:54.537255", 
    "failed": true, 
    "invocation": {
        "module_args": {
            "_raw_params": "oc status --config=/etc/origin/master/admin.kubeconfig -n kube-system", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "msg": "non-zero return code", 
    "rc": 1, 
    "start": "2018-09-22 08:15:54.280712", 
    "stderr": "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?\nThe connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
    "stderr_lines": [
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?", 
        "The connection to the server master.rkdomain.test:8443 was refused - did you specify the right host or port?"
    ], 
    "stdout": "", 
    "stdout_lines": []
}
2018-09-22 08:15:54,578 p=30768 u=root |  ...ignoring
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://master.rkdomain.test:8443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster.rkdomain.test&limit=500&resourceVersion=0: dial tcp 192.168.151.6:8443: getsockopt: connection refused", 
        "Sep 22 08:15:55 master.rkdomain.test atomic-openshift-node[18119]: E0922 08:15:55.614538   18119 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://master.rkdomain.test:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.151.6:8443: getsockopt: connection refused", 
        "Sep 22 08:15:56 master.rkdomain.test atomic-openshift-node[18119]: W0922 08:15:56.201702   18119 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d", 
        "Sep 22 08:15:56 master.rkdomain.test atomic-openshift-node[18119]: E0922 08:15:56.201909   18119 kubelet.go:2146] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized", 
        "Sep 22 08:15:56 master.rkdomain.test atomic-openshift-node[18119]: E0922 08:15:56.594113   18119 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://master.rkdomain.test:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster.rkdomain.test&limit=500&resourceVersion=0: dial tcp 192.168.151.6:8443: getsockopt: connection refused", 
        "Sep 22 08:15:56 master.rkdomain.test atomic-openshift-node[18119]: E0922 08:15:56.611648   18119 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://master.rkdomain.test:8443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster.rkdomain.test&limit=500&resourceVersion=0: dial tcp 192.168.151.6:8443: getsockopt: connection refused", 
        "Sep 22 08:15:56 master.rkdomain.test atomic-openshift-node[18119]: E0922 08:15:56.616173   18119 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://master.rkdomain.test:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.151.6:8443: getsockopt: connection refused"
    ]
}
2018-09-22 08:15:58,022 p=30768 u=root |  TASK [openshift_control_plane : Report control plane errors msg=Control plane pods didn't come up] ***********************************
2018-09-22 08:15:58,022 p=30768 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:215
2018-09-22 08:15:58,112 p=30768 u=root |  fatal: [master.rkdomain.test]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "msg": "Control plane pods didn't come up"
}
2018-09-22 08:15:58,115 p=30768 u=root |  NO MORE HOSTS LEFT *******************************************************************************************************************
2018-09-22 08:15:58,117 p=30768 u=root |  	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry

2018-09-22 08:15:58,117 p=30768 u=root |  PLAY RECAP ***************************************************************************************************************************
2018-09-22 08:15:58,118 p=30768 u=root |  infranode.rkdomain.test    : ok=24   changed=2    unreachable=0    failed=0   
2018-09-22 08:15:58,118 p=30768 u=root |  localhost                  : ok=13   changed=0    unreachable=0    failed=0   
2018-09-22 08:15:58,119 p=30768 u=root |  master.rkdomain.test       : ok=226  changed=34   unreachable=0    failed=1   
2018-09-22 08:15:58,119 p=30768 u=root |  nodeone.rkdomain.test      : ok=24   changed=2    unreachable=0    failed=0   
2018-09-22 08:15:58,119 p=30768 u=root |  INSTALLER STATUS *********************************************************************************************************************
2018-09-22 08:15:58,125 p=30768 u=root |  Initialization              : Complete (0:00:36)
2018-09-22 08:15:58,126 p=30768 u=root |  Health Check                : Complete (0:02:44)
2018-09-22 08:15:58,127 p=30768 u=root |  Node Bootstrap Preparation  : Complete (0:00:02)
2018-09-22 08:15:58,127 p=30768 u=root |  etcd Install                : Complete (0:01:36)
2018-09-22 08:15:58,128 p=30768 u=root |  Master Install              : In Progress (0:25:27)
2018-09-22 08:15:58,128 p=30768 u=root |  	This phase can be restarted by running: playbooks/openshift-master/config.yml
2018-09-22 08:15:58,129 p=30768 u=root |  Failure summary:


  1. Hosts:    master.rkdomain.test
     Play:     Configure masters
     Task:     Report control plane errors
     Message:  Control plane pods didn't come up

OS Version

[root@master playbooks]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
[root@master playbooks]#

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 24 (1 by maintainers)

Most upvoted comments

@raj4linux is hard to chase everyone but we keep repeating that an open issue should not be hijacked by other issues…

initially you reported control plane pods are not coming up then a solution was provided by @nagonzalez and then the conv moved into different issue.

Quite frankly is hard and time consuming to chase a github issue which has mixed sub-issues .

out of 10 random issues, 8 are with oh, i have this issue too, and this, and that you lose track what was the initial prob.

Please don’t take it personally, i’m only trying to bring some sanity around, hopefully with more discipline from you and others we’ll succeed together.

@DanyC97 You are right. I should have not mixed 2 issues. My apology for the same.

In future I will take care of things.

Ahh, that’s helpful. I use Vagrant to test and have to explicitly set the IP as well.

add this to [OSEv3:vars]

# Configure nodeIP in the node config
# This is needed in cases where node traffic is desired to go over an
# interface other than the default network interface.
openshift_set_node_ip=true

then add openshift_ip=x.x.x.x to each node

[nodes]
master.rkdomain.test openshift_ip=x.x.x.x openshift_node_group_name='node-config-master'
nodeone.rkdomain.test openshift_ip=x.x.x.x openshift_node_group_name='node-config-compute'
infranode.rkdomain.test openshift_ip=x.x.x.x openshift_node_group_name='node-config-infra'