harvester: [BUG] Multiple nodes are upgrading simultaneously in an RKE2 upgrade
Describe the bug
Upgrade to:
- RKE2: v1.24.6+rke2r1
- Rancher: v2.6.9-rc3
In a multi-node setup, the upgrade might be stuck in the middle. Behind the scenes, it is because there are multiple nodes (>= 2) upgrading simultaneously while upgrading RKE2, which implies they are in an unschedulable mode. This could result in the following while a node is in the pre-drain state:
- The Longhorn volumes cannot go back from “degraded” to “healthy” state due to the replicas scheduling failure
- The pre-drain job cannot live-migrate VMs out of the node they run on
Most of the time, the pre-drain Job will be stuck in the first situation because we only start to live-migrate VMs after all the volumes are healthy.
Below is the crime scene:
node-0:~ # k -n harvester-system get upgrades
NAME AGE
hvst-upgrade-t9mbq 4h29m
node-0:~ # k -n harvester-system get upgrades hvst-upgrade-t9mbq -o jsonpath='{.status.conditions}' | jq .
[
{
"status": "Unknown",
"type": "Completed"
},
{
"lastUpdateTime": "2022-10-11T09:46:52Z",
"status": "True",
"type": "ImageReady"
},
{
"lastUpdateTime": "2022-10-11T09:49:11Z",
"status": "True",
"type": "RepoReady"
},
{
"lastUpdateTime": "2022-10-11T10:36:19Z",
"status": "True",
"type": "NodesPrepared"
},
{
"lastUpdateTime": "2022-10-11T10:59:49Z",
"status": "True",
"type": "SystemServicesUpgraded"
},
{
"status": "Unknown",
"type": "NodesUpgraded"
}
]
node-0:~ # k -n harvester-system get upgrades hvst-upgrade-t9mbq -o jsonpath='{.status.nodeStatuses}' | jq .
{
"node-0": {
"state": "Succeeded"
},
"node-1": {
"state": "Pre-drained"
},
"node-2": {
"state": "Pre-draining"
},
"node-3": {
"state": "Images preloaded"
}
}
The first node node-0 is upgraded successfully. But node-1 and node-2 are being upgraded at the same time (though not exactly “start” at the “same time”, it violates what we expected, i.e. upgrade one node at a time). Both node-1 and node-2 are marked with SchedulingDisabled.
node-0:~ # k get no
NAME STATUS ROLES AGE VERSION
node-0 Ready control-plane,etcd,master 5d11h v1.24.6+rke2r1
node-1 Ready,SchedulingDisabled control-plane,etcd,master 5d10h v1.24.6+rke2r1
node-2 Ready,SchedulingDisabled control-plane,etcd,master 5d10h v1.22.12+rke2r1
node-3 Ready <none> 5d10h v1.22.12+rke2r1
Go check the upgrade-related Pods. Here we can see the post-drain Job on the first upgraded node node-0 has been completed, which means node-0 is fully upgraded without issues. There is a pre-drain Job left on node-2 running. This implies that node-2 is in the pre-draining state, and VMs should be evacuated to other nodes. How about node-1? From the output of k get no shown above, we know that RKE2 on node-1 has been upgraded to v1.24.6+rke2r1. The uptime of node-1 is the same as node-2 and node-3 (given that all 4 nodes are booted up at the same time). We can infer that node-1 has never entered the post-drain state. But somehow node-2 is triggered to upgrade before the upgrade is completed on node-1.
node-0:~ # k -n harvester-system get po -l harvesterhci.io/upgradeComponent=node
NAME READY STATUS RESTARTS AGE
hvst-upgrade-t9mbq-post-drain-node-0-fbrwk 0/1 Completed 0 3h10m
hvst-upgrade-t9mbq-pre-drain-node-2-rbj2w 1/1 Running 0 160m
From the log of the pre-drain Job on node-2, we can see that it keeps waiting for the Longhorn volumes to become healthy.
node-0:~ # k -n harvester-system logs hvst-upgrade-t9mbq-pre-drain-node-2-rbj2w --since=1m
+ '[' true ']'
+ '[' 4 -gt 2 ']'
++ kubectl get volumes.longhorn.io/pvc-306f865e-5bfa-4d12-8779-fe3371425305 -n longhorn-system -o 'jsonpath={.status.robustness}'
+ robustness=degraded
+ '[' degraded = healthy ']'
+ '[' -f /tmp/skip-pvc-306f865e-5bfa-4d12-8779-fe3371425305 ']'
+ echo 'Waiting for volume pvc-306f865e-5bfa-4d12-8779-fe3371425305 to be healthy...'
+ sleep 10
Waiting for volume pvc-306f865e-5bfa-4d12-8779-fe3371425305 to be healthy...
+ '[' true ']'
+ '[' 4 -gt 2 ']'
++ kubectl get volumes.longhorn.io/pvc-306f865e-5bfa-4d12-8779-fe3371425305 -n longhorn-system -o 'jsonpath={.status.robustness}'
Waiting for volume pvc-306f865e-5bfa-4d12-8779-fe3371425305 to be healthy...
+ robustness=degraded
+ '[' degraded = healthy ']'
+ '[' -f /tmp/skip-pvc-306f865e-5bfa-4d12-8779-fe3371425305 ']'
+ echo 'Waiting for volume pvc-306f865e-5bfa-4d12-8779-fe3371425305 to be healthy...'
+ sleep 10
+ '[' true ']'
+ '[' 4 -gt 2 ']'
++ kubectl get volumes.longhorn.io/pvc-306f865e-5bfa-4d12-8779-fe3371425305 -n longhorn-system -o 'jsonpath={.status.robustness}'
Waiting for volume pvc-306f865e-5bfa-4d12-8779-fe3371425305 to be healthy...
+ robustness=degraded
+ '[' degraded = healthy ']'
+ '[' -f /tmp/skip-pvc-306f865e-5bfa-4d12-8779-fe3371425305 ']'
+ echo 'Waiting for volume pvc-306f865e-5bfa-4d12-8779-fe3371425305 to be healthy...'
That’s impossible for the current situation because there are only 2 schedulable nodes (node-1 and node-3) for all the 3 replicas to run on. We can prove this by examining the condition of the volume, it shows there’s a ReplicaSchedulingFailure:
node-0:~ # k -n longhorn-system get volumes pvc-306f865e-5bfa-4d12-8779-fe3371425305 -o jsonpath='{.status.conditions}' | jq .
[
{
"lastProbeTime": "",
"lastTransitionTime": "2022-10-11T09:46:57Z",
"message": "",
"reason": "",
"status": "False",
"type": "restore"
},
{
"lastProbeTime": "",
"lastTransitionTime": "2022-10-11T11:30:29Z",
"message": "",
"reason": "ReplicaSchedulingFailure",
"status": "False",
"type": "scheduled"
},
{
"lastProbeTime": "",
"lastTransitionTime": "2022-10-11T09:46:57Z",
"message": "",
"reason": "",
"status": "False",
"type": "toomanysnapshots"
}
]
And the running replicas of the volume only exist on node-0 and node-3
node-0:~ # k -n longhorn-system get lhr -l longhornvolume=pvc-306f865e-5bfa-4d12-8779-fe3371425305
NAME STATE NODE DISK INSTANCEMANAGER IMAGE AGE
pvc-306f865e-5bfa-4d12-8779-fe3371425305-r-4477ad68 running node-0 24e133ac-90bf-47ea-bf3f-0a519adda3ec instance-manager-r-7cffeda1 longhornio/longhorn-engine:v1.3.1 5h58m
pvc-306f865e-5bfa-4d12-8779-fe3371425305-r-8f915cdf stopped 4h14m
pvc-306f865e-5bfa-4d12-8779-fe3371425305-r-f902b787 running node-3 1e8daca0-9a5b-47fc-b17a-e87da5c49edf instance-manager-r-3bf011a4 longhornio/longhorn-engine:v1.3.1 5h58m
The whole upgrade is stuck in the pre-draining state on node-2 almost forever because the volume it waits for can never become a healthy one.
It’s time to take a look at the other side. The upgrade controller instructs Rancher to upgrade RKE2 on each node by updating the manifest of clusters.provisioning.cattle.io/local CR. Here we specified the upgrade concurrency of both the control plane and worker to 1 to make sure the upgrade of Harvester goes smoothly.
node-0:~ # k -n fleet-local get clusters local -o jsonpath='{.spec.rkeConfig}' | jq .
{
"chartValues": null,
"machineGlobalConfig": null,
"provisionGeneration": 1,
"upgradeStrategy": {
"controlPlaneConcurrency": "1",
"controlPlaneDrainOptions": {
"deleteEmptyDirData": true,
"enabled": true,
"force": true,
"ignoreDaemonSets": true,
"postDrainHooks": [
{
"annotation": "harvesterhci.io/post-hook"
}
],
"preDrainHooks": [
{
"annotation": "harvesterhci.io/pre-hook"
}
],
"timeout": 0
},
"workerConcurrency": "1",
"workerDrainOptions": {
"deleteEmptyDirData": true,
"enabled": true,
"force": true,
"ignoreDaemonSets": true,
"postDrainHooks": [
{
"annotation": "harvesterhci.io/post-hook"
}
],
"preDrainHooks": [
{
"annotation": "harvesterhci.io/pre-hook"
}
],
"timeout": 0
}
}
}
The pre-drain and post-drain annotations are set on the corresponding secrets when the RKE2 of the node/machine is upgrading.
node-0:~ # k -n fleet-local get machines
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
custom-2d94d5d682dc local node-3 rke2://node-3 Running 5d13h
custom-7c1afab6e79d local node-0 rke2://node-0 Running 5d14h
custom-929d403d1670 local node-1 rke2://node-1 Running 5d14h
custom-c05d0d11190c local node-2 rke2://node-2 Running 5d13h
Check the annotations of the secret custom-929d403d1670-machine-plan (node-1):
node-0:~ # k -n fleet-local get secrets custom-929d403d1670-machine-plan -o jsonpath='{.metadata.annotations}' | jq .
{
"harvesterhci.io/pre-hook": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"
preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}",
"objectset.rio.cattle.io/applied": "H4sIAAAAAAAA/4xTwW7bOhD8lYc9i36iLMuUgF6K9pRDgaDope5htVzGqihSIOmkhaF/L6TEaO3Ebo8iZ4azo9kjDJxQY0JojoDO+YSp8y7On779zpQip1Xo/IowJcurzv/faWgg9CwGpH3nGLKrUP/kOIiHx/6Z8cfNo8z+u+ucfnd/9/G99ymmgONflRwODA3QISY/iLqodZmvtay2+T9R4
4g0841lTsJ6QgvZhTGLLdsIDRx3sMfwyDFx2FM33w3o8IH1DpodpHDgHUwwZUCBl9Q+dwPHhMMIjTtYm8FJ6whkD7PO6ofoVZylXg5OE528XHkRmuW9G0PuMe6hgZJVyVpLs1FbluWm1kySVIFG57piaTA3bZHjq7Gv+LkAeZeCt2K06FgEb/m3sXMkJ9I3AS/dEUuZyKDKDVY5ttsNb3K5xpwKhRuqeFOqqjV1JZlbY1BxhbxeV0pSqZTmbV7L
q+I363JOefKh53BmecrgusCp/EsW8Ix8u15L/e7ZcGBHHKH5egQcuy8cYufdG4sBGbTWU/9pJn5gy2nBzZ4yePkFlsPppO/cnOHFHt0c/bCkLg1TgYUSZNZSlKRroYoqF8xUV4hI27KG6duUQfo58iujZwFMvwIAAP//B0mT1koEAAA",
"objectset.rio.cattle.io/id": "rke-machine",
"objectset.rio.cattle.io/owner-gvk": "rke.cattle.io/v1, Kind=RKEBootstrap",
"objectset.rio.cattle.io/owner-name": "custom-929d403d1670",
"objectset.rio.cattle.io/owner-namespace": "fleet-local",
"rke.cattle.io/drain-done": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"
preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}",
"rke.cattle.io/drain-options": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}]
,\"preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}",
"rke.cattle.io/join-url": "https://192.168.122.121:9345",
"rke.cattle.io/labels": "{\"harvesterhci.io/managed\":\"true\"}",
"rke.cattle.io/post-drain": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"
preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}",
"rke.cattle.io/pre-drain": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"p
reDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}",
"rke.cattle.io/uncordon": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"pr
eDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}"
}
Check the annotations of the secret custom-c05d0d11190c-machine-plan (node-2):
node-0:~ # k -n fleet-local get secrets custom-c05d0d11190c-machine-plan -o jsonpath='{.metadata.annotations}' | jq .
{
"objectset.rio.cattle.io/applied": "H4sIAAAAAAAA/4xTzW7bPBB8lQ97Fv3xRz+WgF6K9pRDgaDope5hSa1iVRQpkHTSwtC7F1JitHZit0eRM8PZ0ewRRkrYYkJojoDO+YSp9y4un15/J5MipU3o/cZgSpY2vf+/b6GBMBAb0ex7R5BdhfonR4E9PA7PjD9uHkX2313v2nf3dx/fe59iCjj9VcnhSNCAOcTkR2Z40fJWCFFz80/UO
KFZ+J0lSsx6gxayC2MWNdkIDRx3sMfwSDFR2Jt+uRvR4QO1O2h2kMKBdjDDnIEJtKb2uR8pJhwnaNzB2gxOWkcw9rDobH6wYRsXqZeD00QnL1dehGZ978aQe4x7aEDUXFVVVZpC867OqdNFZbiSnSwqIkmC2iLXong19hU/FyDvUvCWTRYdseAt/TZ2jqRk2puAl+6wtUwSS8GraivzUnAjSCkjtFZc5LKotULTKSOR1FbVeVFKIlV32miqJKdc
mKviN+tyTnnyYaBwZnnO4LrAqfxrFvCMfLtea/3uqaNAzlCE5usRcOq/UIi9d28sBmSgrTfDp4X4gSylFbd4yuDlF1gKp5Ohd0uGF3t0c/TDmnpVtCi0RFbKHFle5i3TUhKrW56rnKtyW3cwf5szSD8nemX0LID5VwAAAP//98YwqkoEAAA",
"objectset.rio.cattle.io/id": "rke-machine",
"objectset.rio.cattle.io/owner-gvk": "rke.cattle.io/v1, Kind=RKEBootstrap",
"objectset.rio.cattle.io/owner-name": "custom-c05d0d11190c",
"objectset.rio.cattle.io/owner-namespace": "fleet-local",
"rke.cattle.io/drain-options": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}]
,\"preDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}",
"rke.cattle.io/join-url": "https://192.168.122.122:9345",
"rke.cattle.io/labels": "{\"harvesterhci.io/managed\":\"true\"}",
"rke.cattle.io/pre-drain": "{\"IgnoreErrors\":false,\"deleteEmptyDirData\":true,\"disableEviction\":false,\"enabled\":true,\"force\":true,\"gracePeriod\":0,\"ignoreDaemonSets\":true,\"postDrainHooks\":[{\"annotation\":\"harvesterhci.io/post-hook\"}],\"p
reDrainHooks\":[{\"annotation\":\"harvesterhci.io/pre-hook\"}],\"skipWaitForDeleteTimeoutSeconds\":0,\"timeout\":0}"
}
node-0:~ # k -n cattle-system logs -l app=rancher
W1012 02:39:20.056363 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 02:45:04.057107 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 02:47:50.962077 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 02:52:42.058995 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 02:53:17.964514 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 03:00:00.096962 33 transport.go:288] Unable to cancel request for *client.addQuery
W1012 03:00:25.077424 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 03:00:36.965184 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 03:05:52.966654 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 03:08:43.085276 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
2022/10/12 03:00:05 [INFO] [planner] rkecluster fleet-local/local: waiting: uncordoning etcd node(s) custom-929d403d1670: waiting for uncordon to finish
2022/10/12 03:03:13 [INFO] Downloading repo index from http://harvester-cluster-repo.cattle-system/charts/index.yaml
2022/10/12 03:03:14 [INFO] Downloading repo index from https://releases.rancher.com/server-charts/stable/index.yaml
2022/10/12 03:08:13 [INFO] Downloading repo index from http://harvester-cluster-repo.cattle-system/charts/index.yaml
2022/10/12 03:08:14 [INFO] Downloading repo index from https://releases.rancher.com/server-charts/stable/index.yaml
2022/10/12 03:08:23 [INFO] [planner] rkecluster fleet-local/local: waiting: uncordoning etcd node(s) custom-929d403d1670: waiting for uncordon to finish
2022/10/12 03:08:23 [INFO] [planner] rkecluster fleet-local/local: waiting: uncordoning etcd node(s) custom-929d403d1670: waiting for uncordon to finish
2022/10/12 03:09:05 [INFO] [planner] rkecluster fleet-local/local: waiting: uncordoning etcd node(s) custom-929d403d1670: waiting for uncordon to finish
2022/10/12 03:09:08 [INFO] [planner] rkecluster fleet-local/local: waiting: uncordoning etcd node(s) custom-929d403d1670: waiting for uncordon to finish
2022/10/12 03:10:06 [INFO] [planner] rkecluster fleet-local/local: waiting: uncordoning etcd node(s) custom-929d403d1670: waiting for uncordon to finish
W1012 02:34:06.943613 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 02:39:19.762602 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 02:40:24.941360 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 02:47:05.943248 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 02:47:15.766388 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 02:52:50.945579 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 02:54:48.767367 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 02:59:23.961176 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1012 03:04:26.770103 33 warnings.go:80] network.harvesterhci.io/v1beta1 NodeNetwork is deprecated
W1012 03:05:31.964012 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
To Reproduce
Steps to reproduce the behavior:
- Prepare a multi-node (>= 3) Harvester cluster in v1.0.3
- Trigger an upgrade to v1.1.0-rc2
- Wait for the upgrade stuck in the pre-draining state (not always reproducible), and check the nodes’ state to confirm the issue has happened
Expected behavior
The RKE2 upgrade should be kicked off one node at a time and the upgrade of Harvester should end up successfully.
Support bundle
supportbundle_ebe4b217-6355-4356-823a-f9e5f09e28b4_2022-10-11T16-46-36Z.zip
Environment
- Harvester ISO version: v1.1.0-rc2
- Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630):
Additional context
Related issue: rancher/rancher#39167 Related issues in previous versions: rancher/rancher#35999, rancher/rancher#37502
The issue could be work around temporarily with uncordoning the node manually, say kubectl uncordon node-1 from the above case. Then the upgrade will proceed. But there’s high chance encountering other RKE2 issues like #2893, restarting the rke2-server on the corresponding node might be necessary.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 22 (13 by maintainers)
Commits related to this issue
- fix(upgrade): make sure only one node pre/post-draining This is a workaround for #2907, to prevent multi-node draining simultaneously. Signed-off-by: Zespre Chang <zespre.chang@suse.com> — committed to starbops/harvester by starbops 2 years ago
- fix(upgrade): make sure only one node pre/post-draining This is a workaround for #2907, to prevent multi-node draining simultaneously. Signed-off-by: Zespre Chang <zespre.chang@suse.com> — committed to starbops/harvester by starbops 2 years ago
- fix(upgrade): make sure only one node pre/post-draining This is a workaround for #2907, to prevent multi-node draining simultaneously. Signed-off-by: Zespre Chang <zespre.chang@suse.com> — committed to starbops/harvester by starbops 2 years ago
- fix(upgrade): make sure only one node pre/post-draining This is a workaround for #2907, to prevent multi-node draining simultaneously. Signed-off-by: Zespre Chang <zespre.chang@suse.com> — committed to harvester/harvester by starbops 2 years ago
- fix(upgrade): make sure only one node pre/post-draining This is a workaround for #2907, to prevent multi-node draining simultaneously. Signed-off-by: Zespre Chang <zespre.chang@suse.com> — committed to bk201/harvester by starbops 2 years ago
- fix(upgrade): make sure only one node pre/post-draining This is a workaround for #2907, to prevent multi-node draining simultaneously. Signed-off-by: Zespre Chang <zespre.chang@suse.com> — committed to harvester/harvester by starbops 2 years ago
@irishgordo It’s normal to see some error logs during an upgrade since there are versions bump or node offlines. I left some comments here:
1.We need to check. If the cluster is ready at the end and it should be OK. Runkubectl get clusters.provisioning.cattle.io local -n fleet-local -o yamlto check if it’s ready at the end.2.autoscaling/v2is deprecated in v1.23. (We upgrade to v1.24.x).3, 4: this should be proper behavior during an upgrade since we turn on and off nodes.5, 6.: This is due to the Kubernetes version being mismatched during an upgrade: some nodes are v1.24.x and some are still v1.22. Eventually, those helm-install pods should succeed.7.Known issue. The job will run again and succeed.8.@FrankYang0529 Please take a look. It might be related to the hostname or something.@irishgordo Yes, we can track the
cattle-system/sync-containerd-registryissue in another ticket. Thanks!@starbops you’re welcome, I’m glad I could help! 😄 – I do think yeah perhaps the there is something funny with the I/O limit on the disk, but the strange thing is, that it is SATA3, a Kingston or Samsung SSD (I don’t recall the brand 😅 )- that’s is installed on my 1U server. ( https://www.supermicro.com/products/archive/motherboard/x9drd-it_ )
@bk201 I could close this out and open up another ticket for things found with number 8 -with the RFC-1123 for
cattle-system/sync-containerd-registry, if that works?Hi @irishgordo, thank you for spending so much time on this! Just curious what’s the type of your disks? Is it HDD? Could be a slow speed causes that many interesting errors that eventually self-healed.
I think we can only wait for rancher/rancher#39167 to reproduce and fix it from the root. This issue is just for a record of what we can do to get around this at this critical moment. Since we have a certain degree of protection and workarounds on our side to overcome it, I think we can close this out as you suggested.
cc @bk201
Updating some new observations here.
In this kind of situation, the fix in #2923 can actually guard the upgrade procedure. It will hold up the pre-drain Job that’s going to be placed on node-2 and let the draining and post-drain Job finish on node-1 without any interference. After node-1 is fully upgraded, the pre-drain Job will be placed on node-2 and proceed with the upgrade.
But if node-2 is put in the SchedulingDisabled mode by Rancher before node-1 is pre-drained, there is a high chance that the pre-drain Job on node-1 will be stuck in the waiting for all LH volumes to become healthy. The workaround for this is to uncordon node-2 or set the degraded volumes’ numberOfReplicas to 2 or 1 to make them healthy (prefer the former one).
@starbops thanks for looking into this! 😄 I spent about 3.5hrs working through another run of this- setting up the env and then running the updates on 3 8C 16GiB 200GiB Disk QEMU/KVM VMs on a supermicro 1u. Testing upgrading from v1.0.3 to Harvester version v1.1.0-rc3.
I wasn’t able to see it be hung like it was in the past.
In order to prepare the 3 node cluster I needed to run through:
then:
then:
As the first time I did encounter the “Job was active longer than specified deadline” - https://github.com/harvester/harvester/issues/2894
Interesting Things I Saw:
For all interesting things I saw, they seemed to resolve themselves and did not impact the upgrade. The upgrade itself did take about 2 hours, on 3 nodes.
The Process I Noticed
Test Artifacts
Additional Test Artifact
I have the full length video 1.4Gi available if desired. Here is it sped up a lot.
https://user-images.githubusercontent.com/5370752/196349175-8d7fece7-b70f-4f6c-bb97-6a2969b713f3.mp4
@starbops - since it succeeded and the things that I noticed that arose ended up seeming to resolve themselves, I feel like we could close this out, pending that we have workarounds for some of the old
instance-managerpods and a few other things like rebooting the node if it does get stuck as well. Since they seem to be behaving correctly in terms of pre-draining, draining, post-draining, reboot, and succeded - and only one is doing that at any given time 😄 - What are your thoughts?