rancher: [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [xxx.xxx.xxx.xxx] on host [ip address]: Get https://localhost:6443/healthz: can not build dialer to cluster-z4rdx:m-snndn
Describe your issue here
When trying to bring up a cluster on Amazon EC2 it continually fails the health check. If you perform the check manually by taking the IP address and changing the URL everything is fine (returns ok).
Seems to be an issue on timing but I have played with different zones (eu-central, eu-west), I have renamed the machines in the cluster each time (as there is another issue with keypairs already being created if you use the same name) and deleted the security group. I’ve also changed the machine config to use something more ‘powerful’ in case the t2.mlcro is too ‘small’ for the etcd or controller.
Ironically the first time I used rancher to do this is worked perfectly. The only difference since I tried this was to put an nginx reverse proxy in front of the docker image to use our own SSL certificate and domain name. I will revert this as well but I can no longer teardown and recreate.
This problem is related to the reverse proxy. If you deactivate nginx and fall back to the docker container (with warnings over the SSL) it provisions ok.
Useful | Info |
---|---|
Versions | Rancher v2.0.0-beta2 UI: v2.0.34 |
Access | local admin |
Route | global-admin.clusters.index |
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 47 (7 by maintainers)
I am also getting the same issue with the GA release of v2.0.0 [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [52.56.178.162]: Get https://localhost:6443/healthz: can not build dialer to c-bh6kw:m-rxq72
Any resolution to this, had a previous setup before in rc1 which did work.
Cool, for me, as superseb stated the following did the job:
Same issue here on AWS. I just followed the v2 quick start guide.
@govinda-attal I am having the same problem. Did you resolve this or did you find a workaround? This is preventing me from getting started with Rancher.
Got the same issue - isn’t it a problem with rancher trying to access the health endpoint of the kube-apiserver which has a self signed cert?
The cert of the kube-apiserver looks like this:
I was checking the healthz endpoint manually from the machine rancher is running on.
I am getting an “ok” back when using curl […] -k -v, but rancher is saying: [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [xx.xx.xx.xx]: Get https://localhost:6443/healthz: can not build dialer to c-sqtkd:m-vzxr6, log: I0807 09:51:16.914469 1 logs.go:49] http: TLS handshake error from xx.xx.xx.xx38453: EOF
Looks like rke or something is checking locally and does not have the ca-cert with which the kubernetes apiserver created it’s self signed cert.
Issue 1: Nodes created/added cannot reach the configured
server-url
(usually the IP/name of the host running therancher/rancher
container or LB/proxy), this can be tested by runningcurl -k https://configured_server-url
on the node and see it you get a response, if so, network connectivity is not an issue. This is not something Rancher can automatically configure as you created the node runningrancher/rancher
yourself, so you need to configure appropriate access inbound from the created node IP/subnet. (HTTPS, TCP/443)Issue 2: Certificates are not configured correctly, usually this occurs when a recognized CA certificate needs intermediates added to the certificate to function. It can work well in the browser, but that won’t work in the Go agent. Intermediates need to be added in order, your certificate first, followed by intermediates in the chain. This can be checked by the logging of
rancher/rancher-agent
where it will sayx509: certificate signed by unknown authority
Issue 3: Proxy/loadbalancer does not have the prerequisites listed in the docs, so not support websockets/not passing the correct headers/no HTTP2.
Issue 4: Nodes are being re-used and aren’t cleaned properly before re-using. See https://rancher.com/docs/rancher/v2.x/en/installation/removing-rancher/cleaning-cluster-nodes/#cleaning-a-node-manually how to clean nodes so it won’t interfere with adding the node to a new cluster.