longhorn: [BUG] kubectl drain node gets stuck forever
Describe the bug In case we want to drain a node (RKE2 1.20.7 rke2r2 / longhorn 1.1.100) the drain gets stuck forever in
evicting pod longhorn-system/instance-manager-r-b4be9e85
error when evicting pods/"instance-manager-r-b4be9e85" -n "longhorn-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
To Reproduce Steps to reproduce the behavior: deploy rke2 (3 master, >=4 worker) deploy longhorn deploy rancher-monitoring which creates two PVCs kubectl drain on one worker that has a replica of the grafana or prometheus PV
Expected behavior The drain should complete
Log
evicting pod longhorn-system/instance-manager-r-b4be9e85
error when evicting pods/"instance-manager-r-b4be9e85" -n "longhorn-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
You can also attach a Support Bundle here. You can generate a Support Bundle using the link at the footer of the Longhorn UI. –> will attach this in a few minutes.
Environment:
- Longhorn version: 1.1.100
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): rancher catalog app
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE2 1.20.7 rke2r2
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 14
- Node config
- OS type and version: SLES 15 SP2
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
- Number of Longhorn volumes in the cluster: 4 (2 active)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (11 by maintainers)
Commits related to this issue
- Wrong flag scope. 'isUnusedReplicaOnCurrentNode' is always false at outer for loop. longhorn/longhorn#2673 — committed to mantissahz/longhorn-manager by mantissahz 2 years ago
- Wrong flag scope. 'isUnusedReplicaOnCurrentNode' is always false at outer for loop. longhorn/longhorn#2673 Signed-off-by: James Lu <james.lu@suse.com> — committed to mantissahz/longhorn-manager by mantissahz 2 years ago
- Wrong flag scope. 'isUnusedReplicaOnCurrentNode' is always false at outer for loop. longhorn/longhorn#2673 Signed-off-by: James Lu <james.lu@suse.com> — committed to longhorn/longhorn-manager by mantissahz 2 years ago
Thanks @Martin-Weiss for reporting
I think this a Longhorn bug:
When the volume is created the first time (the volume could be created via UI or via PVC yaml manifest) and has never been attached to a node, Longhorn doesn’t remove the PDB for
the instance-manager-r-xxxpod that contains the volume’s replicas when user runkubectl drains. This blockskubectl draincommandThe reason Longhorn doesn’t remove the PDB is because Longhorn is trying to find a healthy replica on a different node by checking r.Spec.HealthyAt != “”. This check always fails on the volume that has never been attached to a node since the
r.Spec.HealthyAthas never been set.From your provided support bundle I can see that you are having 2 volumes that have never been attached.
Workaround:
Find the volumes that have never been attached to a node and try attach them then detach them. This will set the
r.Spec.HealthyAtfor the volumes’ replicas so Longhorn will remove the PDB forthe instance-manager-r-xxxpod that contains the volume’s replicas when user runkubectl drains. To find those volumes, you can runkubectl get replicas -n longhorn-system -o yamland find the replicas that havefailedAt == "" and healthyAt == "", and get the volume name from thereplica.metadata.ownerReferences[0].name@mantissahz please don’t remove the previous flag. The issue is a regression in 1.3.0-rc instead only. Good catch! Also, you should create another issue to track instead of reopening an already closed released issue.
@Martin-Weiss Currently, I was able to identify several scenarios that would cause PDB. I saw from log that you have create longhorn-test-pvc-rwx. Is it running at time of draining? For monitoring storage, which storage accessModes are you using?
Known Scenarios
Scenario 1: Storage class has
numberOfReplicasof1numberOfReplicasis1, need to increase the number to 2. Otherwise, will encounter PDB errors during draining and upgrade will time out.Scenario 2: PVC/PV/LHV is created through Longhorn UI, but has not yet attached and replicated
Scenario 3: PVC/PV/LHV is created through Longhorn UI and attached to a host node
Scenario 4: RWX volume attached to a node
Scenario 5: RWO volume with last healthy replica
allow-node-drain-with-last-healthy-replicatotrueand able to drain.