rancher: [BUG] Multiple server nodes pre-drains in an RKE2 upgrade

Rancher Server Setup

Rancher version: v2.6.9-rc2
Installation option (Docker install/Helm Chart):
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): v1.22.12+rke2r1 (Upgrade to v1.23.12+rke2r1)
Proxy/Cert Details:

Information about the Cluster

Kubernetes version: v1.22.12+rke2r1 (Upgrade to v1.23.12+rke2r1)
Cluster Type (Local/Downstream): local
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):

User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
- If custom, define the set of permissions:

Describe the bug

To Reproduce

We trigger an RKE2 upgrade in Harvester (with pre-drain/post-drain hook) in a 4-nodes cluster (3 server, 1 worker):

$kubectl edit clusters.provisioning.cattle.io local -n fleet-local

And edit local cluster with:

spec:
  kubernetesVersion: v1.23.12+rke2r1
  localClusterAuthEndpoint: {}
  rkeConfig:
    chartValues: null
    machineGlobalConfig: null
    provisionGeneration: 1
    upgradeStrategy:
      controlPlaneConcurrency: "1"
      controlPlaneDrainOptions:
        deleteEmptyDirData: true
        enabled: true
        force: true
        ignoreDaemonSets: true
        postDrainHooks:
        - annotation: harvesterhci.io/post-hook
        preDrainHooks:
        - annotation: harvesterhci.io/pre-hook
        timeout: 0
      workerConcurrency: "1"
      workerDrainOptions:
        deleteEmptyDirData: true
        enabled: true
        force: true
        ignoreDaemonSets: true
        postDrainHooks:
        - annotation: harvesterhci.io/post-hook
        preDrainHooks:
        - annotation: harvesterhci.io/pre-hook
        timeout: 0

Result

We observe after the first node is upgraded, there is a high chance the rest two server nodes’ scheduling are all disabled. And we see Rancher added pre-drain hooks annotation on plan secrets, which indicates pre-drain signal.

$ kubectl get nodes
NAME    STATUS                     ROLES                       AGE   VERSION
node1   Ready                      control-plane,etcd,master   21d   v1.23.12+rke2r1 <-- upgraded
node2   Ready,SchedulingDisabled   control-plane,etcd,master   21d   v1.23.12+rke2r1  <--
node3   Ready                      <none>                      21d   v1.22.12+rke2r1
node4   Ready,SchedulingDisabled   control-plane,etcd,master   21d   v1.22.12+rke2r1. <--

Expected Result

Only a single server should be disabled.

Screenshots

Additional context

Some observation:

Node2 and node4’s machine plan secrets have rke.cattle.io/pre-drain annotation set.

$ kubectl get machine -A
NAMESPACE     NAME                  CLUSTER   NODENAME   PROVIDERID     PHASE     AGE   VERSION
fleet-local   custom-24d57cc6f506   local     node1      rke2://node1   Running   21d
fleet-local   custom-3865d0441591   local     node2      rke2://node2   Running   21d
fleet-local   custom-3994bff0f3f3   local     node3      rke2://node3   Running   21d
fleet-local   custom-fda201f64657   local     node4      rke2://node4   Running   21d

$ kubectl get secret custom-3865d0441591-machine-plan -n fleet-local -o json | jq '.metadata.annotations."rke.cattle.io/pre-drain"'
"{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}"

$ kubectl get secret custom-fda201f64657-machine-plan -n fleet-local -o json | jq '.metadata.annotations."rke.cattle.io/pre-drain"'
"{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}"

A similar issue was spotted and fixed a while ago: https://github.com/rancher/rancher/issues/35999, but in that issue, the in question nodes are one server and one worker, not all servers.
rancher_pod_logs.zip

SURE-6031

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 16 (12 by maintainers)

Most upvoted comments

I believe I have identified why this is occurring. Huge shout out to @starbops for helping me debug this/gathering me the corresponding logs for this.

https://github.com/rancher/rancher/pull/39101/commits/c6b6afd1d9147f8851505354dc0d1c0179faf2aa is a commit that introduces logic that attempts to continue determining draining status/update a plan if a plan has been applied but probes are failing. This seems to introduce an edge case where a valid but “old” plan may start having its probes fail (which is very possible to happen when the init node is restarted for example), causing the planner to attempt to drain that node.

I’ll need to think of how to prevent this edge case while also accommodating the original desired business logic defined in the PR/commit.

Oats87 on Apr 21, 2023

According to the source code

https://github.com/rancher/rancher/blob/release/v2.7/pkg/provisioningv2/rke2/planner/planner.go#L352

err = p.reconcile(controlPlane, clusterSecretTokens, plan, true, etcdTier, isEtcd, isInitNodeOrDeleting,		"1", joinServer,		controlPlane.Spec.UpgradeStrategy.ControlPlaneDrainOptions)
...
err = p.reconcile(controlPlane, clusterSecretTokens, plan, true, controlPlaneTier, isControlPlane, isInitNodeOrDeleting,		controlPlane.Spec.UpgradeStrategy.ControlPlaneConcurrency, joinServer,		controlPlane.Spec.UpgradeStrategy.ControlPlaneDrainOptions)

etcdTier, controlPlaneTier are fetched 1 in each tier

but they may share the same nodes (e.g. 3 management-node), thus breaks the control policy ControlPlaneConcurrency = "1"

it could be: after the init node is upgraded, it will upgrade another 2 in parallel. sometimes, it will be successful, sometimes not

@starbops Your last test log shows that.

w13915984028 on Oct 26, 2022

We can confirm the issue doesn’t happen recently after bumping to Rancher 2.7.5-rc releases; thanks!

bk201 on Jun 16, 2023

https://github.com/rancher/rancher/pull/41459 reverts the addition of the planAppliedButWaitingForProbes short circuiting

Oats87 on May 15, 2023