rke: RKE network plugin containers dont start

RKE version: v1.3.1

Docker version: (docker version,docker info preferred) Client: Docker Engine - Community Version: 20.10.11 API version: 1.40 Go version: go1.16.9 Git commit: dea9396 Built: Thu Nov 18 00:37:08 2021 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 19.03.6 API version: 1.40 (minimum version 1.12) Go version: go1.12.16 Git commit: 369ce74a3c Built: Thu Feb 13 01:26:21 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.12 GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d runc: Version: 1.0.2 GitCommit: v1.0.2-0-g52b36a2 docker-init: Version: 0.18.0 GitCommit: fec3683

Operating system and kernel: (cat /etc/os-release, uname -r preferred) cat /etc/os-release NAME=“Ubuntu” VERSION=“18.04.4 LTS (Bionic Beaver)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 18.04.4 LTS” VERSION_ID=“18.04” HOME_URL=“https://www.ubuntu.com/” SUPPORT_URL=“https://help.ubuntu.com/” BUG_REPORT_URL=“https://bugs.launchpad.net/ubuntu/” PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy” VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

uname -r 4.15.0-88-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) VmWare VM

cluster.yml file: nodes:

  • address: 10.x.x.1 user: svc-config role: [controlplane,worker,etcd] ssh_key_path : terraform_pk
  • address: 10.x.x.2 user: svc-config role: [controlplane,worker,etcd] ssh_key_path : terraform_pk
  • address: 10.x.x.3 user: svc-config role: [controlplane,worker,etcd] ssh_key_path : terraform_pk

services: etcd: snapshot: true creation: 6h retention: 24h

addon_job_timeout: 100 system_images: etcd: “rancher/mirrored-coreos-etcd:v3.4.15-rancher1” kubernetes: “rancher/hyperkube:v1.18.20-rancher1” alpine: “rancher/rke-tools:v0.1.75” nginxproxy: “rancher/rke-tools:v0.1.75” certdownloader: “rancher/rke-tools:v0.1.75” kubernetesservicessidecar: “rancher/rke-tools:v0.1.75” kubedns: “rancher/mirrored-k8s-dns-kube-dns:1.15.2” dnsmasq: “rancher/mirrored-k8s-dns-dnsmasq-nanny:1.15.2” kubednssidecar: “rancher/mirrored-k8s-dns-sidecar:1.15.2” kubednsautoscaler: “rancher/mirrored-cluster-proportional-autoscaler:1.7.1” flannel: “rancher/mirrored-coreos-flannel:v0.12.0” flannelcni: “rancher/flannel-cni:v0.3.0-rancher6” caliconode: “rancher/mirrored-calico-node:v3.13.4” calicocni: “rancher/mirrored-calico-cni:v3.13.4” calicocontrollers: “rancher/mirrored-calico-kube-controllers:v3.13.4” calicoctl: “rancher/mirrored-calico-ctl:v3.13.4” calicoflexvol: “rancher/mirrored-calico-pod2daemon-flexvol:v3.13.4” canalnode: “rancher/mirrored-calico-node:v3.13.4” canalcni: “rancher/mirrored-calico-cni:v3.13.4” canalflannel: “rancher/mirrored-coreos-flannel:v0.12.0” canalflexvol: “rancher/mirrored-calico-pod2daemon-flexvol:v3.13.4” weavenode: “weaveworks/weave-kube:2.6.4” weavecni: “weaveworks/weave-npc:2.6.4” acicnideploycontainer: “noiro/cnideploy:5.1.1.0.1ae238a” acihostcontainer: “noiro/aci-containers-host:5.1.1.0.1ae238a” aciopflexcontainer: “noiro/opflex:5.1.1.0.1ae238a” acimcastcontainer: “noiro/opflex:5.1.1.0.1ae238a” aciopenvswitchcontainer: “noiro/openvswitch:5.1.1.0.1ae238a” acicontrollercontainer: “noiro/aci-containers-controller:5.1.1.0.1ae238a” acigbpservercontainer: “noiro/gbp-server:5.1.1.0.1ae238a” aciopflexservercontainer: “noiro/opflex-server:5.1.1.0.1ae238a” podinfracontainer: “rancher/mirrored-pause:3.1” ingress: “rancher/nginx-ingress-controller:nginx-0.35.0-rancher2” ingressbackend: “rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1” metricsserver: “rancher/mirrored-metrics-server:v0.3.6” coredns: “rancher/mirrored-coredns-coredns:1.6.9” corednsautoscaler: “rancher/mirrored-cluster-proportional-autoscaler:1.7.1” windowspodinfracontainer: “rancher/kubelet-pause:v0.1.6” nodelocal: “rancher/mirrored-k8s-dns-node-cache:1.15.7”

network: plugin: canal

Steps to Reproduce: rke up --config config.yaml

Results: Network containers (k8s_autoscaler_coredns-autoscaler, k8s_kube-flannel_canal, k8s_calico-node_canal-, k8s_POD_coredns-autoscaler, k8s_flexvol-driver_canal, k8s_install-cni_canal, k8s_POD_canal) don’t start which results in creation failure.

snippet of RKE up output log: Now checking status of node , try rancher/rke#1"", “time="2021-11-29T15:12:10Z" level=error msg="Host 10.53.5.62 failed to report Ready status with error: [controlplane] Error getting node : \"10.x.x.x.\" not found"”, “time="2021-11-29T15:12:10Z" level=info msg="[controlplane] Processing controlplane hosts for upgrade 1 at a time"”, “time="2021-11-29T15:12:10Z" level=info msg="Processing controlplane host 10.x.x.x"”, “time="2021-11-29T15:12:10Z" level=info msg="[controlplane] Now checking status of node 10.x.x.x, try rancher/rke#1"”, “time="2021-11-29T15:12:35Z" level=error msg="Failed to upgrade hosts: 10.53.5.60 with error [[controlplane] Error getting node 10.x.x.x: \"10.x.x.x\" not found]"”, "time="2021-11-29T15:12:35Z" level=fatal msg="[controlPlane] Failed to upgrade Control Plane: [[[controlplane] Error getting node 10.53.5.60: \"10.x.x.x\" not found]]"

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

The docs were changed to move away from system_images and exclusively use kubernetes_version (also described on https://rke.docs.rancher.com/upgrades).

rke config has not been modified according to this change, and until this is corrected, manual creation of cluster.yml is recommended.

Listing the images needed for a Kubernetes version can be done using rke config -s -a.

Wow, fantastic, thank you very much indeed!

I did another RKE remove and redid rke up. I see the network pods are coming up. Not sure how the rke cluster state file was cache. RKE up is run from a docker container and the cluster file is created within that container. After the task completes the container is deleted. So not sure where and how the cluster state file was cached. The VMs were also recreated to make sure to be used as fresh slate.

RKE up succeeds but the etcd server times out. I will raise another issue for it. Closing this issue. Thank you @superseb