longhorn: [IMPROVEMENT] Use PDB to protect Longhorn components from unexpected drains
When using kubectl drain on a single node cluster there might be a case where it’s impossible to correctly cleanup/detach a volume. Since the csi sidecars might end up evicted before the pod/va is cleaned up correctly.
It would be useful to have PDB on the csi-sidecars, to prevent all sidecars from being unavailable. We could consider removing the PDB once we have evaluated that all longhorn volumes are no longer in use. But even as a first step just adding a PDB with a minAvailablity of 1 for the csi sidecars would be beneficial.
A similiar thing would be useful for the share-manager, since it’s possible that the share-manager pod gets evicted before the nfs unmount happens in the csi-plugin which might stall the kubelet/csi-plugin. in single node clusters the nfs mount would get stuck since the share-manager would no longer come up, since the node is cordoned during the drain process.
A separate idea is to look into providing a script to ignore the longhorn components/namespace as part of the drain. Since if you drain all workloads that use the longhorn volumes on that node, there wouldn’t be any active volumes left on that node. This does not apply to replica and share-manager since they could be used by a different node in a multi node cluster.
Simliar to what kubevirt does in it’s maintenace guide provide a selector that drains the appropriate components. REF: https://kubevirt.io/user-guide/operations/node_maintenance/#evict-all-vms-from-a-node REF: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 15 (13 by maintainers)
Test Plan
1. Basic unit tests
1.1 Single node cluster
1.1.1 RWO volumes
csi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookcsi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookbecause there is no attached volumecsi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookkubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --forcecsi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookare removed ->csi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhook, and instance-manager-e pods are evicted -> all volumes are successfully detached1.1.2 RWX volume
csi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookcsi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookkubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --forcecsi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookare removed ->csi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhook, and instance-manager-e pods are evicted -> all volumes are successfully detached1.2 multi-node cluster
csi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookcsi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookbecause there is no attached volumecsi-attacher,csi-provisioner,longhorn-admission-webhook, andlonghorn-conversion-webhookkubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force2. Upgrade Kubernetes for k3s cluster with standalone System Upgrade Controller deployment
planCR to upgrade Kubernetes similar to: Note that theconcurrencyshould be 1 to upgrade node one by one.versionshould be a newer K3s version. And it should contains thedrainstage3. Upgrade Kubernetes for imported k3s cluster in Rancher
v1.21.9+k3s1so that we can upgrade multiple times. Some instructions to create such cluster is here https://docs.k3s.io/datastore/ha-embedded4. Upgrade Kubernetes for provisioned k3s cluster in Rancher
v1.22.11+k3s2. The cluster has 3 nodes each node with both worker and master role. Set the upgrade strategy as below:After updating the doc including node maintenance and Kubernetes upgrade (we don’t have this yet, so probably can create a specific page for it, but overall it’s similar to node maintenance or we would like to combine them together), let’s move this to ready-for-testing.
The goal is to prevent the volume-lifecycle managed pods from being drained by the drain operation, so basically having PDBs to protect the longhorn webhooks, CSI sidecars, and CSI deployer while there are volume engines still running on the node.
After all running volumes are evicted on the draining node, can delete those PBDs to make the drain operation continue. @PhanLe1010 WDYT?
@PhanLe1010 just noticed the doc has not updated with the latest excluded pod selector including webhook. Please help with that update. Thanks.
Assuming that the goal is
get rid of old versions of instance managers, I have tested the following steps:kubectl drain --pod-selector='app!=csi-attacher,app!=csi-provisioner,[longhorn.io/component!=instance-manager,app!=longhorn-admission-webhook,app!=longhorn-conversion-webhook](http://longhorn.io/component!=instance-manager,app!=longhorn-admission-webhook,app!=longhorn-conversion-webhook)' <node> --ignore-daemonsetsJust leaving this note here for future reference. If someone wants to do a shutdown of all volumes without having to modify the scale of the end user/application deployments in the cluster. As well as a zero rebuild maintenance.
How useful this is depends on the situation 😃
kubectl drain --pod-selector='app!=csi-attacher,app!=csi-provisioner,longhorn.io/component!=instance-manager,app!=longhorn-admission-webhook,app!=longhorn-conversion-webhook' <node> --ignore-daemonsetsNote from @innobead
I tested the label selector idea and it works well, as a current workaround, example below: We just need to fix up/standarize our labels the current im has a nice set of labels already.
Workaround for the csi side cars getting drained:
kubectl drain --pod-selector='app!=csi-attacher,app!=csi-provisioner' <node> --ignore-daemonsets