rancher: after importing a cluster, rancher server shows this error [Ready False 38 mins ago [Disconnected] Cluster agent is not connected]

Rancher Server Setup

  • Rancher version: v2.6.3
  • Installation option (Docker install/Helm Chart): Helm Chart
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): k3s
  • Proxy/Cert Details: k3s’s ca

Information about the Cluster

  • Kubernetes version: v1.21.7+k3s1
  • Cluster Type (Local/Downstream): Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Imported

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
    • If custom, define the set of permissions: Admin

Describe the bug

image

To Reproduce

A-Server
0. /etc/profile - export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
1. curl -sfL http://rancher-mirror.cnrancher.com/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn INSTALL_K3S_VERSION=v1.21.7+k3s1 sh -s - server
2. cat /var/lib/rancher/k3s/server/node-token

B-Server
0. /etc/profile - export K3S_TOKEN,export K3S_DATASTORE_ENDPOINT, export K3S_URL="https://10.0.2.15:6443" export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
1. curl -sfL http://rancher-mirror.cnrancher.com/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn INSTALL_K3S_VERSION=v1.21.7+k3s1 sh -s - server

A-Agent
1. /etc/profile - export K3S_TOKEN, export K3S_URL="https://10.0.2.15:6443" export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
2. curl -sfL http://rancher-mirror.cnrancher.com/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn INSTALL_K3S_VERSION=v1.21.7+k3s1 sh -

Install Rancher
1. https://docs.rancher.cn/docs/rancher2/installation/resources/advanced/self-signed-ssl/_index/
2. sh create_self-signed-cert.sh --ssl-domain=rancher.k3s.cn \
--ssl-trusted-ip=192.168.56.100,10.0.2.15 --ssl-size=2048 --ssl-date=3650
3. kubectl create namespace cattle-system
4. kubectl -n cattle-system create secret generic tls-ca \
  --from-file=cacerts.pem=./cacerts.pem

cp cacerts.pem ca-additional.pem
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem

kubectl -n cattle-system create secret tls tls-rancher-ingress \
  --cert=tls.crt \
  --key=tls.key
5. helm install rancher ./rancher \
 --namespace cattle-system \
 --set hostname=rancher.k3s.cn \
 --set replicas=-1 \
 --set ingress.tls.source=secret \
 --set additionalTrustedCAs=true \
 --set useBundledSystemChart=true \
 --set privateCA=true

Result

[root@k3s-node1 rancher]# kubectl get svc -n cattle-system
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
cattle-cluster-agent   ClusterIP   10.43.243.246   <none>        80/TCP,443/TCP   38m
rancher                ClusterIP   10.43.239.164   <none>        80/TCP,443/TCP   58m
rancher-webhook        ClusterIP   10.43.74.101    <none>        443/TCP          56m
webhook-service        ClusterIP   10.43.70.214    <none>        443/TCP          56m
[root@k3s-node1 rancher]# kubectl get deployments -n cattle-system
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
cattle-cluster-agent   2/2     2            2           39m
rancher                1/1     1            1           58m
rancher-webhook        1/1     1            1           56m
[root@k3s-node1 rancher]# clear
[root@k3s-node1 rancher]# kubectl get pods -n cattle-system
NAME                                    READY   STATUS      RESTARTS   AGE
cattle-cluster-agent-7bcb4b8dd9-2th4m   1/1     Running     0          43m
cattle-cluster-agent-7bcb4b8dd9-s65kh   1/1     Running     0          43m
helm-operation-27g5m                    0/2     Completed   0          46m
rancher-584b98cd56-ftnlx                1/1     Running     1          66m
rancher-webhook-5d4f5b7f6d-285zn        1/1     Running     0          63m
[root@k3s-node1 rancher]# kubectl logs helm-operation-27g5m -n cattle-system
error: a container name must be specified for pod helm-operation-27g5m, choose one of: [helm proxy]
[root@k3s-node1 rancher]# kubectl logs helm-operation-27g5m -n cattle-system heml
error: container heml is not valid for pod helm-operation-27g5m
[root@k3s-node1 rancher]# kubectl logs helm-operation-27g5m -n cattle-system helm
helm upgrade --force-adopt=true --history-max=5 --install=true --namespace=cattle-fleet-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-fleet-100.0.2-up0.3.8.yaml --version=100.0.2+up0.3.8 --wait=true fleet /home/shell/helm/fleet-100.0.2-up0.3.8.tgz
checking 15 resources for changes
Looks like there are no changes for ServiceAccount "gitjob"
Looks like there are no changes for ServiceAccount "fleet-controller"
Looks like there are no changes for ServiceAccount "fleet-controller-bootstrap"
Looks like there are no changes for ClusterRole "gitjob"
Looks like there are no changes for ClusterRole "fleet-controller"
Looks like there are no changes for ClusterRole "fleet-controller-bootstrap"
Looks like there are no changes for ClusterRoleBinding "gitjob-binding"
Looks like there are no changes for ClusterRoleBinding "fleet-controller"
Looks like there are no changes for ClusterRoleBinding "fleet-controller-bootstrap"
Looks like there are no changes for Role "fleet-controller"
Looks like there are no changes for RoleBinding "fleet-controller"
Looks like there are no changes for Service "gitjob"
Looks like there are no changes for Deployment "gitjob"
Looks like there are no changes for Deployment "fleet-controller"
beginning wait for 15 resources with timeout of 5m0s
Release "fleet" has been upgraded. Happy Helming!
NAME: fleet
LAST DEPLOYED: Mon Feb 21 08:36:56 2022
NAMESPACE: cattle-fleet-system
STATUS: deployed
REVISION: 2
TEST SUITE: None

---------------------------------------------------------------------
SUCCESS: helm upgrade --force-adopt=true --history-max=5 --install=true --namespace=cattle-fleet-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-fleet-100.0.2-up0.3.8.yaml --version=100.0.2+up0.3.8 --wait=true fleet /home/shell/helm/fleet-100.0.2-up0.3.8.tgz
---------------------------------------------------------------------
[root@k3s-node1 rancher]# kubectl logs cattle-cluster-agent-7bcb4b8dd9-2th4m -n cattle-system
INFO: Environment: CATTLE_ADDRESS=10.42.0.19 CATTLE_CA_CHECKSUM=80753de35f3c48c44e6b1a15906e6a5d078f3d2913faab27f4a9f37a180fad7c CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://10.43.243.246:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://10.43.243.246:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.43.243.246 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.43.243.246:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.43.243.246 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.43.243.246 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES=embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=6bb0a1b9-1590-4914-be72-877a080c56ec CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-7bcb4b8dd9-2th4m CATTLE_SERVER=https://rancher.k3s.cn CATTLE_SERVER_VERSION=v2.6.3
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 10.43.0.10 options ndots:5
INFO: https://rancher.k3s.cn/ping is accessible
INFO: rancher.k3s.cn resolves to 10.0.2.15
INFO: Value from https://rancher.k3s.cn/v3/settings/cacerts is an x509 certificate
time="2022-02-21T08:39:00Z" level=info msg="Listening on /tmp/log.sock"
time="2022-02-21T08:39:00Z" level=info msg="Rancher agent version v2.6.3 is starting"
time="2022-02-21T08:39:00Z" level=info msg="Connecting to wss://rancher.k3s.cn/v3/connect/register with token starting with gfk2xj2cw7hkm78c82xjqxs4cgk"
time="2022-02-21T08:39:00Z" level=info msg="Connecting to proxy" url="wss://rancher.k3s.cn/v3/connect/register"
time="2022-02-21T08:39:00Z" level=info msg="Starting /v1, Kind=Service controller"
[root@k3s-node1 rancher]# kubectl logs cattle-cluster-agent-7bcb4b8dd9-2th4m -n cattle-system^C
[root@k3s-node1 rancher]# kubectl get pods -n cattle-system
NAME                                    READY   STATUS      RESTARTS   AGE
cattle-cluster-agent-7bcb4b8dd9-2th4m   1/1     Running     0          45m
cattle-cluster-agent-7bcb4b8dd9-s65kh   1/1     Running     0          44m
helm-operation-27g5m                    0/2     Completed   0          47m
rancher-584b98cd56-ftnlx                1/1     Running     1          67m
rancher-webhook-5d4f5b7f6d-285zn        1/1     Running     0          65m
[root@k3s-node1 rancher]# kubectl logs cattle-cluster-agent-7bcb4b8dd9-s65kh -n cattle-system
INFO: Environment: CATTLE_ADDRESS=10.42.2.5 CATTLE_CA_CHECKSUM=80753de35f3c48c44e6b1a15906e6a5d078f3d2913faab27f4a9f37a180fad7c CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://10.43.243.246:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://10.43.243.246:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.43.243.246 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.43.243.246:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.43.243.246 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.43.243.246 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES=embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=6bb0a1b9-1590-4914-be72-877a080c56ec CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-7bcb4b8dd9-s65kh CATTLE_SERVER=https://rancher.k3s.cn CATTLE_SERVER_VERSION=v2.6.3
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 10.43.0.10 options ndots:5
INFO: https://rancher.k3s.cn/ping is accessible
INFO: rancher.k3s.cn resolves to 10.0.2.15
INFO: Value from https://rancher.k3s.cn/v3/settings/cacerts is an x509 certificate
time="2022-02-21T08:39:19Z" level=info msg="Listening on /tmp/log.sock"
time="2022-02-21T08:39:19Z" level=info msg="Rancher agent version v2.6.3 is starting"
time="2022-02-21T08:39:19Z" level=info msg="Connecting to wss://rancher.k3s.cn/v3/connect/register with token starting with gfk2xj2cw7hkm78c82xjqxs4cgk"
time="2022-02-21T08:39:19Z" level=info msg="Connecting to proxy" url="wss://rancher.k3s.cn/v3/connect/register"
time="2022-02-21T08:39:19Z" level=info msg="Starting /v1, Kind=Service controller"
[root@k3s-node1 rancher]# kubectl get pods -n cattle-system
NAME                                    READY   STATUS      RESTARTS   AGE
cattle-cluster-agent-7bcb4b8dd9-2th4m   1/1     Running     0          45m
cattle-cluster-agent-7bcb4b8dd9-s65kh   1/1     Running     0          45m
helm-operation-27g5m                    0/2     Completed   0          48m
rancher-584b98cd56-ftnlx                1/1     Running     1          68m
rancher-webhook-5d4f5b7f6d-285zn        1/1     Running     0          66m
[root@k3s-node1 rancher]# kubectl get pods -n cattle-system
NAME                                    READY   STATUS      RESTARTS   AGE
cattle-cluster-agent-7bcb4b8dd9-2th4m   1/1     Running     0          48m
cattle-cluster-agent-7bcb4b8dd9-s65kh   1/1     Running     0          48m
helm-operation-27g5m                    0/2     Completed   0          51m
rancher-584b98cd56-ftnlx                1/1     Running     1          71m
rancher-webhook-5d4f5b7f6d-285zn        1/1     Running     0          69m
[root@k3s-node1 rancher]# kubectl get pods -n cattle-system
NAME                                    READY   STATUS      RESTARTS   AGE
cattle-cluster-agent-7bcb4b8dd9-2th4m   1/1     Running     0          54m
cattle-cluster-agent-7bcb4b8dd9-s65kh   1/1     Running     0          54m
helm-operation-27g5m                    0/2     Completed   0          57m
rancher-584b98cd56-ftnlx                1/1     Running     1          77m
rancher-webhook-5d4f5b7f6d-285zn        1/1     Running     0          74m
[root@k3s-node1 rancher]# ^C
[root@k3s-node1 rancher]# kubectl logs helm-operation-27g5m -n cattle-system
error: a container name must be specified for pod helm-operation-27g5m, choose one of: [helm proxy]
[root@k3s-node1 rancher]# kubectl logs helm-operation-27g5m -n cattle-system helm
helm upgrade --force-adopt=true --history-max=5 --install=true --namespace=cattle-fleet-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-fleet-100.0.2-up0.3.8.yaml --version=100.0.2+up0.3.8 --wait=true fleet /home/shell/helm/fleet-100.0.2-up0.3.8.tgz
checking 15 resources for changes
Looks like there are no changes for ServiceAccount "gitjob"
Looks like there are no changes for ServiceAccount "fleet-controller"
Looks like there are no changes for ServiceAccount "fleet-controller-bootstrap"
Looks like there are no changes for ClusterRole "gitjob"
Looks like there are no changes for ClusterRole "fleet-controller"
Looks like there are no changes for ClusterRole "fleet-controller-bootstrap"
Looks like there are no changes for ClusterRoleBinding "gitjob-binding"
Looks like there are no changes for ClusterRoleBinding "fleet-controller"
Looks like there are no changes for ClusterRoleBinding "fleet-controller-bootstrap"
Looks like there are no changes for Role "fleet-controller"
Looks like there are no changes for RoleBinding "fleet-controller"
Looks like there are no changes for Service "gitjob"
Looks like there are no changes for Deployment "gitjob"
Looks like there are no changes for Deployment "fleet-controller"
beginning wait for 15 resources with timeout of 5m0s
Release "fleet" has been upgraded. Happy Helming!
NAME: fleet
LAST DEPLOYED: Mon Feb 21 08:36:56 2022
NAMESPACE: cattle-fleet-system
STATUS: deployed
REVISION: 2
TEST SUITE: None

---------------------------------------------------------------------
SUCCESS: helm upgrade --force-adopt=true --history-max=5 --install=true --namespace=cattle-fleet-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-fleet-100.0.2-up0.3.8.yaml --version=100.0.2+up0.3.8 --wait=true fleet /home/shell/helm/fleet-100.0.2-up0.3.8.tgz
---------------------------------------------------------------------
[root@k3s-node1 rancher]# 

Expected Result

Screenshots

image image image

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 38

Most upvoted comments

Run this on rancher local cluster might fix your problem kubectl patch clusters.management.cattle.io <REPLACE_WITH_CLUSTERID> -p '{"status":{"agentImage":"dummy"}}' --type merge This command is used to trigger Rancher to redeploy the agent in the downstream cluster.

Any update on this ? I’m having the same issue v2.6.5.

kubectl patch clusters.management.cattle.io <REPLACE_WITH_CLUSTERID> -p ‘{“status”:{“agentImage”:“dummy”}}’ --type merge

yes my man, it’s working and follow this, it’s my all operations ps: after configured the k3s and helm

0. helm repo add rancher-stable http://rancher-mirror.oss-cn-beijing.aliyuncs.com/server-charts/stable
1. kubectl create namespace cattle-system
2. kubectl -n cattle-system create secret tls tls-rancher-ingress \
  --cert=tls.crt \
  --key=tls.key
3. cp cacerts.pem ca-additional.pem
4. kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem=./ca-additional.pem
5. kubectl -n cattle-system create secret generic tls-ca \
--from-file=cacerts.pem=./cacerts.pem
6. helm install rancher rancher-stable/rancher \
 --version v2.6.3 \
 --namespace cattle-system \
 --set hostname=rancher.k3s.cn \
 --set bootstrapPassword=admin \
 --set ingress.tls.source=secret \
 --set privateCA=true \
 --set additionalTrustedCAs=true
7. kubectl -n cattle-system rollout status deploy/rancher
8. (because the cert was made by myself,so i need to configured the hosts, otherwise it does not know my cert's domain name)
kubectl -n cattle-system patch  deployments rancher --patch '{
    "spec": {
        "template": {
            "spec": {
                "hostAliases": [
                    {
                      "hostnames":
                      [
                        "rancher.k3s.cn"
                      ],
                      "ip": "192.168.56.100"
                    }
                ]
            }
        }
    }
}'
kubectl -n cattle-system patch  deployments cattle-cluster-agent --patch '{
    "spec": {
        "template": {
            "spec": {
                "hostAliases": [
                    {
                      "hostnames":
                      [
                        "rancher.k3s.cn"
                      ],
                      "ip": "192.168.56.100"
                    }
                ]
            }
        }
    }
}'
9. kubectl get clusters.management.cattle.io =><REPLACE_WITH_CLUSTERID>
10. kubectl patch clusters.management.cattle.io <REPLACE_WITH_CLUSTERID> -p '{"status":{"agentImage":"dummy"}}' --type merge
11. it's done

image

Hi, we also have the same exact problem with rancher 2.6.3 when we provision a new cluster using RKE.