openshift-ansible: service catalog install failed, may be have some prerequisites ?

Description

I have installed the openshift-origin v3.7, v3.8, v3.9, v3.10, but all got following issues: There may be some prerequisites for service catalog?

fatal: [dev.cefcfco.com]: FAILED! => {
    "attempts": 120,
    "changed": false,
    "cmd": [
        "curl",
        "-k",
        "https://apiserver.kube-service-catalog.svc/healthz"
    ],
    "delta": "0:00:01.188682",
    "end": "2018-03-22 02:32:27.933614",
    "invocation": {
        "module_args": {
            "_raw_params": "curl -k https://apiserver.kube-service-catalog.svc/healthz",
            "_uses_shell": false,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": false
        }
    },
    "rc": 0,
    "start": "2018-03-22 02:32:26.744932",
    "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent
    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0\r100   180  100   180    0     0    153      0  0:00:01  0:00:01 --:--:--   153",
    "stderr_lines": [
        "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current",
        "                                 Dload  Upload   Total   Spent    Left  Speed",
        "",
        "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0",
        "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0",
        "  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0",
        "100   180  100   180    0     0    153      0  0:00:01  0:00:01 --:--:--   153"
    ],
    "stdout": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed",
    "stdout_lines": [
        "[+]ping ok",
        "[+]poststarthook/generic-apiserver-start-informers ok",
        "[+]poststarthook/start-service-catalog-apiserver-informers ok",
        "[-]etcd failed: reason withheld",
        "healthz check failed"
    ]
}
        to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry

INSTALLER STATUS ***********************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : In Progress
        This phase can be restarted by running: playbooks/byo/openshift-cluster/service-catalog.yml



Failure summary:


  1. Hosts:    dev.cefcfco.com
     Play:     Service Catalog
     Task:     wait for api server to be ready
     Message:  Failed without returning a message.

[root@feng ~]# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed

[root@dev ~]# oc get pods -n kube-service-catalog
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-qbjj7            1/1       Running   0          9m
controller-manager-ptz7v   1/1       Running   1          9m

My inventory hosts:

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=root
enable_excluders=False
enable_docker_excluder=False
ansible_service_broker_install=False

containerized=True
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_version

deployment_type=origin
openshift_deployment_type=origin

openshift_release=v3.7.2
openshift_release=v3.7.2
openshift_pkg_version=v3.7.2
openshift_image_tag=v3.7.2
openshift_service_catalog_image_version=v3.7.2
template_service_broker_image_version=v3.7.2
openshift_metrics_image_version=v3.7.2

osm_use_cockpit=true

openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]

openshift_public_hostname=dev.cefcfco.com
openshift_master_default_subdomain=apps.dev.cefcfco.com

[masters]
dev.cefcfco.com openshift_schedulable=true

[etcd]
dev.cefcfco.com

[nodes]
dev.cefcfco.com openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}"

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 4
Comments: 23 (1 by maintainers)

Most upvoted comments

I think i found the reason why its not working:

some of the api servers do not work:

apiserver-8n5g5   @node1   curl -k https://10.128.0.4:6443  healthz check failed
apiserver-cdbfh   @node3   curl -k https://10.129.0.4:6443  healthz check failed
apiserver-n4qm7   @node2   curl -k https://10.130.0.6:6443  ok

a quick look with describe showed me that hey try to reslove the etcd servers:

    Command:
      /usr/bin/service-catalog
    Args:
      apiserver
      --storage-type
      etcd
      --secure-port
      6443
      --etcd-servers
      https://node1.k8s.unigs.de:2379,https://node2.k8s.unigs.de:2379,https://node3.k8s.unigs.de:2379
      --etcd-cafile
      /etc/origin/master/master.etcd-ca.crt
      --etcd-certfile
      /etc/origin/master/master.etcd-client.crt
      --etcd-keyfile
      /etc/origin/master/master.etcd-client.key
      -v
      3
      --cors-allowed-origins
      localhost
      --admission-control
      KubernetesNamespaceLifecycle,DefaultServicePlan,ServiceBindingsLifecycle,ServicePlanChangeValidator,BrokerAuthSarCheck
      --feature-gates
      OriginatingIdentity=true

i exec into the contianer and run the following commands:

sh-4.2# ping node1.k8s.unigs.de
PING node1.k8s.unigs.de.k8s.unigs.de (10.18.255.99) 56(84) bytes of data.
64 bytes from lb.k8s.unigs.de (10.18.255.99): icmp_seq=1 ttl=63 time=0.213 ms

that is clearly wrong. Notice the point on the end on the next command.

sh-4.2# ping node1.k8s.unigs.de.
PING node1.k8s.unigs.de (10.18.255.1) 56(84) bytes of data.
64 bytes from node1.k8s.unigs.de (10.18.255.1): icmp_seq=1 ttl=63 time=0.730 ms

oh interesting!

sh-4.2# cat /etc/resolv.conf  
nameserver 10.18.255.2
search kube-service-catalog.svc.cluster.local svc.cluster.local cluster.local k8s.unigs.de
options ndots:5

as far as i understand it, the ndots:5 option forces to lookup hostnames with fewer than 5 dots. i have 4. so node1.k8s.unigs.de gets resolved to node1.k8s.unigs.de.k8s.unigs.de.

does this ndots option make sense? and how can i force it to use the domain name i provided?

i tried adding openshift_ip= to all of my hosts, but that did not change the result.

foosinn on May 28, 2018

@flipkill1985 I think this may be related to https://github.com/openshift/origin/issues/17316

Do you have a wildcard entry for *.dev.cefcfco.com configured in your DNS?

I’ve recently experienced a similar issue where the apiserver pod failed to resolve the etcd hosts correctly because the DNS lookup was matching a wildcard DNS, entry due to the search and ndots configuration in /etc/resolv.conf inside the apiserver pod

ich199 on Apr 10, 2018