longhorn: [BUG] kubectl drain node gets stuck forever

Describe the bug In case we want to drain a node (RKE2 1.20.7 rke2r2 / longhorn 1.1.100) the drain gets stuck forever in

evicting pod longhorn-system/instance-manager-r-b4be9e85
error when evicting pods/"instance-manager-r-b4be9e85" -n "longhorn-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

To Reproduce Steps to reproduce the behavior: deploy rke2 (3 master, >=4 worker) deploy longhorn deploy rancher-monitoring which creates two PVCs kubectl drain on one worker that has a replica of the grafana or prometheus PV

Expected behavior The drain should complete

Log

evicting pod longhorn-system/instance-manager-r-b4be9e85
error when evicting pods/"instance-manager-r-b4be9e85" -n "longhorn-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

You can also attach a Support Bundle here. You can generate a Support Bundle using the link at the footer of the Longhorn UI. –> will attach this in a few minutes.

Environment:

  • Longhorn version: 1.1.100
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): rancher catalog app
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE2 1.20.7 rke2r2
    • Number of management node in the cluster: 3
    • Number of worker node in the cluster: 14
  • Node config
    • OS type and version: SLES 15 SP2
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 4 (2 active)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks @Martin-Weiss for reporting

I think this a Longhorn bug:

When the volume is created the first time (the volume could be created via UI or via PVC yaml manifest) and has never been attached to a node, Longhorn doesn’t remove the PDB for the instance-manager-r-xxx pod that contains the volume’s replicas when user run kubectl drains. This blocks kubectl drain command

The reason Longhorn doesn’t remove the PDB is because Longhorn is trying to find a healthy replica on a different node by checking r.Spec.HealthyAt != “”. This check always fails on the volume that has never been attached to a node since the r.Spec.HealthyAt has never been set.

From your provided support bundle I can see that you are having 2 volumes that have never been attached.

Workaround:

Find the volumes that have never been attached to a node and try attach them then detach them. This will set the r.Spec.HealthyAt for the volumes’ replicas so Longhorn will remove the PDB for the instance-manager-r-xxx pod that contains the volume’s replicas when user run kubectl drains. To find those volumes, you can run kubectl get replicas -n longhorn-system -o yaml and find the replicas that have failedAt == "" and healthyAt == "", and get the volume name from the replica.metadata.ownerReferences[0].name

@mantissahz please don’t remove the previous flag. The issue is a regression in 1.3.0-rc instead only. Good catch! Also, you should create another issue to track instead of reopening an already closed released issue.

@Martin-Weiss Currently, I was able to identify several scenarios that would cause PDB. I saw from log that you have create longhorn-test-pvc-rwx. Is it running at time of draining? For monitoring storage, which storage accessModes are you using?

Known Scenarios

Scenario 1: Storage class has numberOfReplicas of 1

  • If volume storageclass numberOfReplicas is 1, need to increase the number to 2. Otherwise, will encounter PDB errors during draining and upgrade will time out.

Scenario 2: PVC/PV/LHV is created through Longhorn UI, but has not yet attached and replicated

  • After volume is attached, replicate, and detached, the nodes with volume replicas can be drained successfully.
  • This issue does not seem to be an issue if volume create through pvc using manifest.

Scenario 3: PVC/PV/LHV is created through Longhorn UI and attached to a host node

  • Need to detach then volume can be drain successfully.

Scenario 4: RWX volume attached to a node

  • Need to scale down the workload and drain.

Scenario 5: RWO volume with last healthy replica

  • set allow-node-drain-with-last-healthy-replica to true and able to drain.