rancher: [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [xxx.xxx.xxx.xxx] on host [ip address]: Get https://localhost:6443/healthz: can not build dialer to cluster-z4rdx:m-snndn

Describe your issue here

When trying to bring up a cluster on Amazon EC2 it continually fails the health check. If you perform the check manually by taking the IP address and changing the URL everything is fine (returns ok).

Seems to be an issue on timing but I have played with different zones (eu-central, eu-west), I have renamed the machines in the cluster each time (as there is another issue with keypairs already being created if you use the same name) and deleted the security group. I’ve also changed the machine config to use something more ‘powerful’ in case the t2.mlcro is too ‘small’ for the etcd or controller.

Ironically the first time I used rancher to do this is worked perfectly. The only difference since I tried this was to put an nginx reverse proxy in front of the docker image to use our own SSL certificate and domain name. I will revert this as well but I can no longer teardown and recreate.

This problem is related to the reverse proxy. If you deactivate nginx and fall back to the docker container (with warnings over the SSL) it provisions ok.

Useful	Info
Versions	Rancher `v2.0.0-beta2` UI: `v2.0.34`
Access	`local` `admin`
Route	`global-admin.clusters.index`

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 47 (7 by maintainers)

Most upvoted comments

I am also getting the same issue with the GA release of v2.0.0 [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [52.56.178.162]: Get https://localhost:6443/healthz: can not build dialer to c-bh6kw:m-rxq72

Any resolution to this, had a previous setup before in rc1 which did work.

ghost on May 3, 2018

Cool, for me, as superseb stated the following did the job:

Goto https://<rancher_host>/g/settings/advanced
Edit the server-url and put in the DNS-Name or Floating-IP of your rancher
Enjoy

clouddistortion on May 23, 2019

Same issue here on AWS. I just followed the v2 quick start guide.

n3v3rf411 on May 15, 2018

@govinda-attal I am having the same problem. Did you resolve this or did you find a workaround? This is preventing me from getting started with Rancher.

twillert on Aug 8, 2018

Got the same issue - isn’t it a problem with rancher trying to access the health endpoint of the kube-apiserver which has a self signed cert?

The cert of the kube-apiserver looks like this:

Server certificate:
subject: CN=kube-apiserver
start date: Aug 07 07:53:52 2018 GMT
expire date: Aug 07 07:53:53 2019 GMT
common name: kube-apiserver
issuer: CN=kube-ca

I was checking the healthz endpoint manually from the machine rancher is running on.

I am getting an “ok” back when using curl […] -k -v, but rancher is saying: [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [xx.xx.xx.xx]: Get https://localhost:6443/healthz: can not build dialer to c-sqtkd:m-vzxr6, log: I0807 09:51:16.914469 1 logs.go:49] http: TLS handshake error from xx.xx.xx.xx38453: EOF

Looks like rke or something is checking locally and does not have the ca-cert with which the kubernetes apiserver created it’s self signed cert.

dgabrysch on Aug 7, 2018

Issue 1: Nodes created/added cannot reach the configured server-url (usually the IP/name of the host running the rancher/rancher container or LB/proxy), this can be tested by running curl -k https://configured_server-url on the node and see it you get a response, if so, network connectivity is not an issue. This is not something Rancher can automatically configure as you created the node running rancher/rancher yourself, so you need to configure appropriate access inbound from the created node IP/subnet. (HTTPS, TCP/443)

Issue 2: Certificates are not configured correctly, usually this occurs when a recognized CA certificate needs intermediates added to the certificate to function. It can work well in the browser, but that won’t work in the Go agent. Intermediates need to be added in order, your certificate first, followed by intermediates in the chain. This can be checked by the logging of rancher/rancher-agent where it will say x509: certificate signed by unknown authority

Issue 3: Proxy/loadbalancer does not have the prerequisites listed in the docs, so not support websockets/not passing the correct headers/no HTTP2.

Issue 4: Nodes are being re-used and aren’t cleaned properly before re-using. See https://rancher.com/docs/rancher/v2.x/en/installation/removing-rancher/cleaning-cluster-nodes/#cleaning-a-node-manually how to clean nodes so it won’t interfere with adding the node to a new cluster.

superseb on Aug 7, 2018