rancher: [BUG] Multiple server nodes pre-drains in an RKE2 upgrade
Rancher Server Setup
- Rancher version: v2.6.9-rc2
- Installation option (Docker install/Helm Chart):
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): v1.22.12+rke2r1 (Upgrade to v1.23.12+rke2r1)
- Proxy/Cert Details:
Information about the Cluster
- Kubernetes version: v1.22.12+rke2r1 (Upgrade to v1.23.12+rke2r1)
- Cluster Type (Local/Downstream): local
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):
User Information
- What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
- If custom, define the set of permissions:
Describe the bug
To Reproduce
- We trigger an RKE2 upgrade in Harvester (with pre-drain/post-drain hook) in a 4-nodes cluster (3 server, 1 worker):
$kubectl edit clusters.provisioning.cattle.io local -n fleet-local
And edit local cluster with:
spec:
kubernetesVersion: v1.23.12+rke2r1
localClusterAuthEndpoint: {}
rkeConfig:
chartValues: null
machineGlobalConfig: null
provisionGeneration: 1
upgradeStrategy:
controlPlaneConcurrency: "1"
controlPlaneDrainOptions:
deleteEmptyDirData: true
enabled: true
force: true
ignoreDaemonSets: true
postDrainHooks:
- annotation: harvesterhci.io/post-hook
preDrainHooks:
- annotation: harvesterhci.io/pre-hook
timeout: 0
workerConcurrency: "1"
workerDrainOptions:
deleteEmptyDirData: true
enabled: true
force: true
ignoreDaemonSets: true
postDrainHooks:
- annotation: harvesterhci.io/post-hook
preDrainHooks:
- annotation: harvesterhci.io/pre-hook
timeout: 0
Result
We observe after the first node is upgraded, there is a high chance the rest two server nodes’ scheduling are all disabled. And we see Rancher added pre-drain hooks annotation on plan secrets, which indicates pre-drain signal.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,etcd,master 21d v1.23.12+rke2r1 <-- upgraded
node2 Ready,SchedulingDisabled control-plane,etcd,master 21d v1.23.12+rke2r1 <--
node3 Ready <none> 21d v1.22.12+rke2r1
node4 Ready,SchedulingDisabled control-plane,etcd,master 21d v1.22.12+rke2r1. <--
Expected Result
Only a single server should be disabled.
Screenshots
Additional context
Some observation:
- Node2 and node4’s machine plan secrets have
rke.cattle.io/pre-drain
annotation set.
$ kubectl get machine -A
NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
fleet-local custom-24d57cc6f506 local node1 rke2://node1 Running 21d
fleet-local custom-3865d0441591 local node2 rke2://node2 Running 21d
fleet-local custom-3994bff0f3f3 local node3 rke2://node3 Running 21d
fleet-local custom-fda201f64657 local node4 rke2://node4 Running 21d
$ kubectl get secret custom-3865d0441591-machine-plan -n fleet-local -o json | jq '.metadata.annotations."rke.cattle.io/pre-drain"'
"{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}"
$ kubectl get secret custom-fda201f64657-machine-plan -n fleet-local -o json | jq '.metadata.annotations."rke.cattle.io/pre-drain"'
"{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}"
- A similar issue was spotted and fixed a while ago: https://github.com/rancher/rancher/issues/35999, but in that issue, the in question nodes are one server and one worker, not all servers.
- rancher_pod_logs.zip
SURE-6031
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (12 by maintainers)
I believe I have identified why this is occurring. Huge shout out to @starbops for helping me debug this/gathering me the corresponding logs for this.
https://github.com/rancher/rancher/pull/39101/commits/c6b6afd1d9147f8851505354dc0d1c0179faf2aa is a commit that introduces logic that attempts to continue determining draining status/update a plan if a plan has been applied but probes are failing. This seems to introduce an edge case where a valid but “old” plan may start having its probes fail (which is very possible to happen when the init node is restarted for example), causing the planner to attempt to drain that node.
I’ll need to think of how to prevent this edge case while also accommodating the original desired business logic defined in the PR/commit.
According to the source code
https://github.com/rancher/rancher/blob/release/v2.7/pkg/provisioningv2/rke2/planner/planner.go#L352
etcdTier, controlPlaneTier
are fetched 1 in each tierbut they may share the same nodes (e.g. 3 management-node), thus breaks the control policy
ControlPlaneConcurrency = "1"
it could be: after the init node is upgraded, it will upgrade another 2 in parallel. sometimes, it will be successful, sometimes not
@starbops Your last test log shows that.
We can confirm the issue doesn’t happen recently after bumping to Rancher 2.7.5-rc releases; thanks!
https://github.com/rancher/rancher/pull/41459 reverts the addition of the
planAppliedButWaitingForProbes
short circuiting