rancher: [BUG] Can not upgrade RKE1 1.22 cluster via rancher 2.7.2+
Rancher Server Setup
- Rancher version: using Rancher 2.7.2 or 2.7.3
- Installation option (Docker install/Helm Chart): docker
Information about the Cluster
- Kubernetes version: RKE1 1.22.9
- Cluster Type (Local/Downstream): downstream
Describe the bug Trying to upgrade a RKE1/1.22.9 cluster to 1.24.10 failed with the message
Failed to validate cluster: cluster version must be at least v1.23 to use PodSecurity in RKE
But after downgrading to Rancher 2.7.1 the cluster upgrade was successfull
To Reproduce
Result
Expected Result
Screenshots
Additional context
SURE-6328
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 4
- Comments: 23 (9 by maintainers)
Adding this to the milestone for the next release for now
Hi @thaneunsoo, when you validate the fix for this issue, could you confirm that Rancher logs do not flood with the following error message?
[ERROR] error syncing 'c-XXXX/m-YYYY': handler machinesLabelSyncer: Failed to validate cluster: cluster version must be at least v1.23 to use PodSecurity in RKE, requeuing
The QA TEST PlAN is updated to call out this specific checking item.
Hi @jiaqiluo, Added a node to the downstream cluster with all three roles and the issue got resolved.
Before :
After (issue resolved) :
I can confirm this is a bug in Rancher v2.7.2+ and RKE 1.4.4+.
Cause & Consequence
Starting in RKE 1.4.4, RKE support using PSA( Pod Security Admission) in cluster version 1.23 and above.
When RKE initializes the cluster, it sets the default value to the field
spec.Services.KubeAPI.PodSecurityConfiguration
when no value is provided in the cluster.yml file (see code) regardless of the cluster’s Kubernetes version. But later on, when RKE validates the cluster’s configuration before starting the upgrading, it expects to see the abovementioned field to be set only for clusters whose k8s version is at least 1.23 (see code).Even though we can not directly create a cluster with k8s version 1.22 or less with RKE 1.4.4, we can use RKE 1.4.4 to manage existing clusters. When the existing cluster’s k8s version is 1.22 or less, we will hit this bug when running the
rke up
command. The command fails with the following errorSimilarly, on the Rancher side, if we upgrade Rancher from v2.7.1 or less to v2.7.2 or above and have an existing downstream RKE1 cluster k8s version 1.22 or less, if we upgrade or edit the cluster in Rancher UI, the cluster will be stuck in the upgrading state with the following error message:
Workaround
For the standalone RKE cluster, we can use RKE CLI 1.3.20 or less to upgrade the cluster to 1.23; afterward, we use RKE CLI 1.4.4 to manage the cluster. If we can not upgrade the cluster version, we must keep using RKE CLI 1.3.20 or less to manage the cluster.
On the Rancher side, First of all, we should upgrade any existing downstream to 1.23 or above before upgrading Rancher to v2.7.2 or above to eliminate the bug.
If, unfortunately, we have already upgraded Rancher and have downstream clusters being affected, because the upgrade process fails at the pre-flight checks, the cluster should still be functional even if its state is stuck on
updating
. But it also means that any other editing on the cluster, except for the Kubernetes version, can not take effect.