rancher: Clusters imported stuck at "Waiting for API server", with k8s-mode=external or embedded

What kind of request is this (question/bug/enhancement/feature request): Bug

Summary: Clusters imported via Rancher server are forever stuck in the “Waiting for API server” state, when Rancher Server is run in either “external” mode (pointing to a separate k8s cluster) or “embedded” mode.

This regressed between Docker tags 2.0.16 and 2.1.0 - this functionality works with the former and fails with the latter, and has been broken in newer versions.

Steps to reproduce (least amount of steps as possible):

  • Easiest way to reproduce
    • Deploy Rancher from rancher-embedded-mode.yaml
    • Front end it with an ingress, certs, DNS, and log in.
    • Import a cluster, then run the generate YAML on the target cluster.
    • You’ll see the cluster go from Pending -> Waiting, and logs in both the cattle-agent (imported cluster) side and Rancher side indicating the Websocket connection has been established.
    • Nothing else happens after this point, and no more log messages. The cluster is stuck in “Waiting for API server”.
  • Reproducer in external mode - just to show that the problem is not limited to embedded mode:
    • Drop the base64-encoded KUBECONFIG for some external cluster is k8s-secret.yaml.
    • Deploy Rancher from rancher-external-mode.yaml. Rancher will be persisting to the above external cluster.
    • Follow same steps as above (the problem is the same).

Result: The imported cluster is stuck in Waiting for API server.

Other details that may be helpful:

  • To make the problem go away:
    • Rerun either of the above examples using the v2.0.16 image instead of v2.2.7.
    • Rerun either of the above examples using k8s-mode=auto instead of embedded or external.

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): rancher/rancher v2.1.0 and above.
  • Installation option (single install/HA): single install

gzrancher/rancher#11362

About this issue

Most upvoted comments

I eventually got it K8s 1.20 working using RKE and imported it into Rancher UI . Thanks

@anandr781 I just wanted to say thank you for this hint. If anybody arrives at this thread via a google search, here’s what I did as a workaround to get a single node K8S cluster up and running, then connect it to Rancher :

  • I used RKE to install the single node cluster, using RKE 1.2.13 and K8S 1.20
  • Docker version for K8S 1.20 is 19.03 (as per release notes of K8S) so I installed it via curl https://releases.rancher.com/install-docker/19.03.sh | sudo bash prior to RKE install
  • I imported this cluster as a custom cluster to my Rancher deployment (which is a single node deployment running under docker) by following the instructions Rancher provides from its UI

Result: Rancher 2.6.3 is working with this K8S cluster.

I just got bitten by this while following the manual/quick start instructions from Rancher documentation.

As per the documentation, I am running Rancher (2.6.3) using docker, on a quite nice machine (10 cores, 64 gig ram, ssd etc) I provisioned an EC2 node on AWS and ran the command generated by Rancher’s custom cluster wizard.

The logs for the cluster report success as [INFO ] Finished building Kubernetes cluster successfully

The cluster’s state is stuck in Waiting for API to be available

Is there any way I can work around this while staying with the approach I followed? This is the scenario that helps me most, getting a cluster up an running quickly so that we can experiment with it in the company. I suspect many others like me will go for the lowest hanging fruit. A workaround would help a lot.

image