rke: Update from RKE v1.2.11 to v1.3.1 leads to unreachable Rancher UI

I recently tried to update to Rancher version 2.6 and since I was at it I also updated rke from version 1.2.11 to version 1.3.1. I’m running a single node setup on a Bare-metal server with self signed certificates behind a layer 4 load balancer (multinode setup planned in the future). After the update I read in this Documentation for the configuration of the network options that I need to add network_mode: none to my ingress configuration since there was a change in Kubernetes. I did this as seen in my cluster configuration below. Nevertheless after deploying the cluster, creating the namespace, adding the certificates and deploying rancher as shown in the steps below I’m unable to curl or browse the Ranger UI. Even though kubectl -n cattle-system rollout status deploy/rancher tells me the deployment was successfull and kubectl -n cattle-system get pods shows that it is running.
I tried to gather as much information as possible including the ingress logs, the rancher logs and the log output from my load balancer when trying to curl the rancher server. All I can see is that there seems to be no endpoint reachable. The configuration (without the ingress addition) worked fine for rke 1.2.11 and Rancher 2.5. I have no idea where I should look next or what might be the cause of the issue. So every help in that direction is much appreciated.

RKE version:

v1.3.1

Docker version: (docker version)

Client:
 Version:           20.10.6-ce
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        8728dd246c3a
 Built:             Thu Apr 15 12:00:00 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.6-ce
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8728dd246c3a
  Built:            Thu Apr 15 12:00:00 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.1.5_catatonit
  GitCommit:

Operating system and kernel: (cat /etc/os-release, uname -r preferred) os-release:

NAME="SLES"
VERSION="15-SP3"
VERSION_ID="15.3"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP3"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp3"
DOCUMENTATION_URL="https://documentation.suse.com/"

uname -r: 5.3.18-59.19-default

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Bare-metal

cluster.yml file:

nodes:
  - address: lff-l3vsrv103.internal.de
    user: cowherder
    role:
      - controlplane
      - worker
      - etcd

services:
  etcd:
    backup_config:
      interval_hours: 12
      retention: 6

# Cluster level SSH private key
# Used if no ssh information is set for the node
ssh_key_path: /home/cowherder/.ssh/rancher_admin_id_rsa

# Set the name of the Kubernetes cluster
cluster_name: lff_adm_cluster

prefix_path: /opt/rke

ignore_docker_version: false

ingress:
  provider: nginx
  network_mode: none
  options:
    use-proxy-protocol: true
  extra_args:
    http-port: 80
    https-port: 443

Steps to Reproduce:

  • use rke to deploy cluster
  • kubectl create namespace cattle-system
  • kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=/opt/rancher_inst/ssl_rancheradm.internal.de/tls.crt --key=/opt/rancher_inst/ssl_rancheradm.internal.de/tls.key
  • kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=/opt/rancher_inst/lff_root_ca/cacerts.pem
  • helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=rancheradm.internal.de --set ingress.tls.source=secret --set privateCA=true --set bootstrapPassword=startPass --set replicas=3 --set proxy=http://www.proxy.internal.de:80 --set noProxy=127.0.0.1\\,localhost\\,0.0.0.0\\,10.0.0.0/8\\,cattle-system.svc\\,.svc\\,.cluster.local\\,.internal.de
  • Wait for rancher to be deployed
  • curl -k -vvv https://rancheradm.internal.de Results:
* Uses proxy env variable no_proxy == '127.0.0.1,localhost,0.0.0.0,10.0.0.0/8,cattle-system.svc,.svc,.cluster.local,.internal.de,10.48.90.148,10.48.90.144,10.48.90.145,10.48.90.137'
*   Trying 10.48.90.137:443...
* TCP_NODELAY set
* Connected to rancheradm.internal.de (10.48.90.137) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to rancheradm.internal.de:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to rancheradm.internal.de:443

Ingress log

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       nginx-0.48.1-rancher1
  Build:         git-4bf680f6c
  Repository:    https://github.com/rancher/ingress-nginx.git
  nginx version: nginx/1.20.1

-------------------------------------------------------------------------------

I0922 07:13:34.857467       7 flags.go:211] "Watching for Ingress" class="nginx"
W0922 07:13:34.857632       7 flags.go:216] Ingresses with an empty class will also be processed by this Ingress controller
W0922 07:13:34.858709       7 client_config.go:614] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0922 07:13:34.858966       7 main.go:241] "Creating API client" host="https://10.43.0.1:443"
I0922 07:13:34.871961       7 main.go:285] "Running in Kubernetes cluster" major="1" minor="21" git="v1.21.5" state="clean" commit="aea7bbadd2fc0cd689de94a54e5b7b758869d691" platform="linux/amd64"
I0922 07:13:35.540937       7 main.go:105] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0922 07:13:35.543437       7 main.go:115] "Enabling new Ingress features available since Kubernetes v1.18"
W0922 07:13:35.545694       7 main.go:127] No IngressClass resource with name nginx found. Only annotation will be used.
I0922 07:13:35.642104       7 ssl.go:532] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0922 07:13:35.689372       7 nginx.go:254] "Starting NGINX Ingress controller"
I0922 07:13:35.740182       7 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"nginx-configuration", UID:"f9766ac2-382b-4116-820f-993befcdb30b", APIVersion:"v1", ResourceVersion:"721", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/nginx-configuration
I0922 07:13:36.889935       7 nginx.go:296] "Starting NGINX process"
I0922 07:13:36.890275       7 nginx.go:316] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
I0922 07:13:36.890337       7 leaderelection.go:243] attempting to acquire leader lease ingress-nginx/ingress-controller-leader-nginx...
I0922 07:13:36.891207       7 controller.go:148] "Configuration changes detected, backend reload required"
I0922 07:13:36.905403       7 leaderelection.go:253] successfully acquired lease ingress-nginx/ingress-controller-leader-nginx
I0922 07:13:36.905746       7 status.go:84] "New leader elected" identity="nginx-ingress-controller-ldjzf"
I0922 07:13:37.065143       7 status.go:204] "POD is not ready" pod="ingress-nginx/nginx-ingress-controller-ldjzf" node="lff-l3vsrv103.bfd.bayern.de"
I0922 07:13:37.141787       7 controller.go:165] "Backend successfully reloaded"
I0922 07:13:37.142037       7 controller.go:176] "Initial sync, sleeping for 1 second"
I0922 07:13:37.143115       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"nginx-ingress-controller-ldjzf", UID:"5fc10fe4-9be2-4605-b710-70e969fdaac5", APIVersion:"v1", ResourceVersion:"811", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration

W0922 07:19:47.329540       7 controller.go:992] Service "cattle-system/rancher" does not have any active Endpoint.
W0922 07:19:47.329597       7 controller.go:1207] Error getting SSL certificate "cattle-system/tls-rancher-ingress": local SSL certificate cattle-system/tls-rancher-ingress was not found. Using default certificate
I0922 07:19:47.398349       7 main.go:112] "successfully validated configuration, accepting" ingress="rancher/cattle-system"
I0922 07:19:47.425950       7 backend_ssl.go:66] "Adding secret to local store" name="cattle-system/tls-rancher-ingress"
I0922 07:19:47.426203       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"d6e5e5ca-9a99-41df-b4cf-36c6db250461", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"1370", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0922 07:19:50.578546       7 controller.go:992] Service "cattle-system/rancher" does not have any active Endpoint.
I0922 07:19:50.579051       7 controller.go:148] "Configuration changes detected, backend reload required"
I0922 07:19:50.771484       7 controller.go:165] "Backend successfully reloaded"
I0922 07:19:50.776575       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"nginx-ingress-controller-ldjzf", UID:"5fc10fe4-9be2-4605-b710-70e969fdaac5", APIVersion:"v1", ResourceVersion:"811", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
W0922 07:19:53.911917       7 controller.go:992] Service "cattle-system/rancher" does not have any active Endpoint.
I0922 07:20:37.199375       7 status.go:284] "updating Ingress status" namespace="cattle-system" ingress="rancher" currentValue=[] newValue=[{IP: Hostname:lff-l3vsrv103.bfd.bayern.de Ports:[]}]
W0922 07:20:37.292629       7 controller.go:992] Service "cattle-system/rancher" does not have any active Endpoint.
I0922 07:20:37.292722       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"d6e5e5ca-9a99-41df-b4cf-36c6db250461", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"1902", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0922 07:22:26.280388       7 main.go:112] "successfully validated configuration, accepting" ingress="rancher/cattle-system"
I0922 07:22:26.542352       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"d6e5e5ca-9a99-41df-b4cf-36c6db250461", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"2901", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0922 07:22:47.285329       7 store.go:453] "secret was updated and it is used in ingress annotations. Parsing" secret="cattle-system/tls-rancher-ingress"

Log output from kubectl -n cattle-system logs -f rancher-75b8bc6df6-k8vcs rancher_log.txt

Log output from nginx load balancer

2021/09/22 08:26:11 [notice] 38139#38139: built by gcc 7.5.0 (SUSE Linux)
2021/09/22 08:26:11 [notice] 38139#38139: OS: Linux 5.3.18-59.19-default
2021/09/22 08:26:11 [notice] 38139#38139: getrlimit(RLIMIT_NOFILE): 1024:524288
2021/09/22 08:26:11 [notice] 38139#38139: start worker processes
2021/09/22 08:26:11 [notice] 38139#38139: start worker process 38150
2021/09/22 08:26:11 [notice] 38139#38139: start worker process 38151
2021/09/22 08:26:11 [notice] 38139#38139: start worker process 38152
2021/09/22 08:26:11 [notice] 38139#38139: start worker process 38153
2021/09/22 08:27:22 [info] 38150#38150: *1 client 10.48.90.148:42518 connected to 0.0.0.0:443
2021/09/22 08:27:22 [error] 38150#38150: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.90.148, server: 0.0.0.0:443, upstream: "10.48.90.148:443", bytes from/to client:0/0, bytes from/to upstream:0/0
2021/09/22 08:56:53 [info] 38150#38150: *3 client 10.48.90.137:51176 connected to 0.0.0.0:443
2021/09/22 08:56:53 [error] 38150#38150: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.90.137, server: 0.0.0.0:443, upstream: "10.48.90.148:443", bytes from/to client:0/0, bytes from/to upstream:0/0
2021/09/22 08:58:13 [info] 38150#38150: *5 client 10.48.90.137:51182 connected to 0.0.0.0:443
2021/09/22 08:58:13 [error] 38150#38150: *5 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.90.137, server: 0.0.0.0:443, upstream: "10.48.90.148:443", bytes from/to client:0/0, bytes from/to upstream:0/0
2021/09/22 09:04:43 [info] 38150#38150: *7 client 10.48.90.137:51186 connected to 0.0.0.0:443
2021/09/22 09:04:43 [error] 38150#38150: *7 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.90.137, server: 0.0.0.0:443, upstream: "10.48.90.148:443", bytes from/to client:0/0, bytes from/to upstream:0/0
2021/09/22 09:05:12 [info] 38150#38150: *9 client 10.48.90.137:51192 connected to 0.0.0.0:443
2021/09/22 09:05:12 [error] 38150#38150: *9 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.90.137, server: 0.0.0.0:443, upstream: "10.48.90.148:443", bytes from/to client:0/0, bytes from/to upstream:0/0
2021/09/22 09:05:28 [info] 38150#38150: *11 client 10.48.90.137:55166 connected to 0.0.0.0:80
2021/09/22 09:05:28 [error] 38150#38150: *11 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.90.137, server: 0.0.0.0:80, upstream: "10.48.90.148:80", bytes from/to client:0/0, bytes from/to upstream:0/0
2021/09/22 09:32:25 [info] 38150#38150: *13 client 10.48.90.137:51206 connected to 0.0.0.0:443
2021/09/22 09:32:25 [error] 38150#38150: *13 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.90.137, server: 0.0.0.0:443, upstream: "10.48.90.148:443", bytes from/to client:0/0, bytes from/to upstream:0/0
2021/09/22 10:29:19 [info] 38150#38150: *15 client 10.48.25.34:56601 connected to 0.0.0.0:443
2021/09/22 10:29:19 [error] 38150#38150: *15 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.25.34, server: 0.0.0.0:443, upstream: "10.48.90.148:443", bytes from/to client:0/0, bytes from/to upstream:0/0

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 22 (8 by maintainers)

Most upvoted comments

The path is client -> load balancer -> host’s ingress controller -> pod.

As before, you can bypass the load balancer to rule that out and connect to the host’s ingress controller directly. Tailing that ingress controller log of the host that you are connecting to should give you info on what happens when the connection is made. You can also raise the logging verbosity on the ingress controller, first step is to see if you can properly connect to the host’s ingress controller. If that works, you can check wether the ingress controller can connect to any of the rancher pods. I assume they are all active and ready.

Because of the addition of the admission webhook to the new ingress nginx controller, the network mode was changed from hostNetwork to hostPort. This might also give some problems (although it shouldn’t, especially with everything disabled firewall wise), but last resort would be to force the ingress controller to the old mode (although this exposes the admission webhook to the outside which is why we changed it to hostPort)

ingress:
  provider: nginx
  network_mode: hostNetwork