rancher: RKE2/K3S Upgrades from Rancher not working if master nodes are tainted on imported rke2/k3s clusters
What kind of request is this (question/bug/enhancement/feature request): Bug
Steps to reproduce (least amount of steps as possible):
- Create a highly available RKE2 cluster, with an older version (e.g.
v1.20.4+rke2r1) and configure the control-plane/master nodes withCriticalAddonsOnly=true:NoExecutetaint as documented here https://docs.rke2.io/install/ha/#2a-optional-consider-server-node-taints. This is a best practices to prevent user workloads to run on the control plane nodes. - Import the RKE2 cluster into Rancher
- Upgrade the RKE2 cluster from the Rancher UI (Edit Cluster, Change version, Save)
Result:
The system-upgrade-controller Pod, as well as the Pods of the rke2 master upgrade plan can’t be scheduled because they are missing a toleration for the CriticalAddonsOnly=true:NoExecute taint.
Other details that may be helpful:
- system-upgrade-controller Helm chart: https://github.com/rancher/system-charts/tree/dev-v2.5/charts/rancher-k3s-upgrader
- Plan templates: https://github.com/rancher/rancher/blob/master/pkg/controllers/management/k3sbasedupgrade/template.go
This also affects a K3S cluster, where such a taint has been added, as documented at https://rancher.com/docs/k3s/latest/en/installation/ha/#2-launch-server-nodes
Environment information
- Rancher version (
rancher/rancher/rancher/serverimage tag or shown bottom left in the UI): 2.5.7 - Installation option (single install/HA): HA
Cluster information
- Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Imported RKE2
- Kubernetes version (use
kubectl version): v1.20.4+rke2r1
SURE-3506
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 21 (14 by maintainers)
After adding the CriticalAddonsOnly toleration to the system-upgrade-controller allowed it to deploy.
Updating the plan for the master nodes to provision with the appropriate toleration allowed the upgrade to continue:
Agent nodes successfully updated after controlplane completed.
Possibly related: https://github.com/k3s-io/k3s/issues/3007
Works for me now. The upgrade runs through successfully.