rancher: [BUG] Can not upgrade RKE1 1.22 cluster via rancher 2.7.2+

Rancher Server Setup

  • Rancher version: using Rancher 2.7.2 or 2.7.3
  • Installation option (Docker install/Helm Chart): docker

Information about the Cluster

  • Kubernetes version: RKE1 1.22.9
  • Cluster Type (Local/Downstream): downstream

Describe the bug Trying to upgrade a RKE1/1.22.9 cluster to 1.24.10 failed with the message

Failed to validate cluster: cluster version must be at least v1.23 to use PodSecurity in RKE

But after downgrading to Rancher 2.7.1 the cluster upgrade was successfull

To Reproduce

Result

Expected Result

Screenshots

Additional context

SURE-6328

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 4
  • Comments: 23 (9 by maintainers)

Most upvoted comments

Adding this to the milestone for the next release for now

Hi @thaneunsoo, when you validate the fix for this issue, could you confirm that Rancher logs do not flood with the following error message? [ERROR] error syncing 'c-XXXX/m-YYYY': handler machinesLabelSyncer: Failed to validate cluster: cluster version must be at least v1.23 to use PodSecurity in RKE, requeuing

The QA TEST PlAN is updated to call out this specific checking item.

Hi @jiaqiluo, Added a node to the downstream cluster with all three roles and the issue got resolved.

Before :

  • node 1: etcd, controlplane
  • node 2: etcd, controlplane
  • node 3: etcd, controlplane (node temporarily down)
  • node 4: worker
  • node 5 worker

After (issue resolved) :

  • node 1: etcd, controlplane
  • node 2: etcd, controlplane, worker
  • node 3: etcd, controlplane (node temporarily down)
  • node 4: worker
  • node 5 worker image

I can confirm this is a bug in Rancher v2.7.2+ and RKE 1.4.4+.

Cause & Consequence

Starting in RKE 1.4.4, RKE support using PSA( Pod Security Admission) in cluster version 1.23 and above.

When RKE initializes the cluster, it sets the default value to the field spec.Services.KubeAPI.PodSecurityConfiguration when no value is provided in the cluster.yml file (see code) regardless of the cluster’s Kubernetes version. But later on, when RKE validates the cluster’s configuration before starting the upgrading, it expects to see the abovementioned field to be set only for clusters whose k8s version is at least 1.23 (see code).

Even though we can not directly create a cluster with k8s version 1.22 or less with RKE 1.4.4, we can use RKE 1.4.4 to manage existing clusters. When the existing cluster’s k8s version is 1.22 or less, we will hit this bug when running the rke up command. The command fails with the following error

FATA[0000] Failed to validate cluster: cluster version must be at least v1.23 to use PodSecurity in RKE

Similarly, on the Rancher side, if we upgrade Rancher from v2.7.1 or less to v2.7.2 or above and have an existing downstream RKE1 cluster k8s version 1.22 or less, if we upgrade or edit the cluster in Rancher UI, the cluster will be stuck in the upgrading state with the following error message:

Failed to validate cluster: cluster version must be at least v1.23 to use PodSecurity in RKE
Screenshot 2023-04-28 at 2 46 50 PM

Workaround

For the standalone RKE cluster, we can use RKE CLI 1.3.20 or less to upgrade the cluster to 1.23; afterward, we use RKE CLI 1.4.4 to manage the cluster. If we can not upgrade the cluster version, we must keep using RKE CLI 1.3.20 or less to manage the cluster.

On the Rancher side, First of all, we should upgrade any existing downstream to 1.23 or above before upgrading Rancher to v2.7.2 or above to eliminate the bug.

If, unfortunately, we have already upgraded Rancher and have downstream clusters being affected, because the upgrade process fails at the pre-flight checks, the cluster should still be functional even if its state is stuck on updating. But it also means that any other editing on the cluster, except for the Kubernetes version, can not take effect.