topolvm: LogicalVolumes "leak" and slow down provisioning
Describe the bug
LogicalVolume objects tied to deleted nodes are not cleaned up and cannot be cleaned up, because the finalizer field is cleared by the topolvm-node on the node itself (which is no longer running).
Over time this results in the number of LV objects building up in Kubernetes. The controller is constantly trying to process these and that leads to a general slowdown in that and thus in the provisioning of new volumes (as it’s spending a lot of time/threads trying to delete LVs which can’t be deleted)
According to the docs:
LogicalVolumeis created with a finalizer. When aLogicalVolumeis being deleted,topolvm-nodeon the target node deletes the corresponding LVM logical volume and clears the finalizer.
I’ve seen this happening on two of my own clusters today: one had 400+ stale LVs and another 1200+, built up over a year or so (through multiple topolvm upgrades but there were stale LVs created today - I upgraded to 0.18.1 weeks ago). On the larger cluster it was taking many hours to do a full rolling redeploy of 170 pods over 22 deployments
Note that LVs actually seem to get into two states:
- They are marked for deletion (have a deletionTimestamp), but the finalizer blocks it
- They belong to an invalid node (
.spec.nodeNameis no longer a valid node) but are not marked for deletion (and still have the finalizer, so once they’re deleted 1 will become an issue)
Environments
- Version: 0.18.1
- OS: EKS v1.23 / 1.24
To Reproduce Steps to reproduce the behavior:
I haven’t figured out a surefire way to reproduce this - I can delete a node and the topolvm-node process usually seems to pick that up. I suspect that given certain conditions that process is either slow or the pod isn’t running or gets terminated too quickly, and it doesn’t get to removing the finalizer
Expected behavior If the node that a LogicalVolume belongs to no longer exists, the controller should remove the finalizer itself before asking for its deletion
Additional context
These two scripts tidy up the LVs for me, in the short term I’ll keep running them on an ad-hoc basis:
- Delete LVs tied to non-existant nodes:
kubectl get logicalvolumes -ojson | jq -c .items[] | grep -vf <(kubectl get no -ojson | jq .items[].metadata.name -r) | jq -r .metadata.name | xargs kubectl delete logicalvolume
- Remove finalizer on deleted LVs to allow deletion to complete:
kubectl get LogicalVolume -ojson | jq .items[] -c | grep deletionTimestamp | jq -r .metadata.name | xargs -I{} kubectl patch LogicalVolume {} -p '{"metadata":{"finalizers":null}}' --type merge
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (16 by maintainers)
Commits related to this issue
- Don't re-add finalizer to LogicalVolumes that are about to be deleted This fixes a race-condition between node_controller and logicalvolume_controller where node_controller removes the finalizer from... — committed to spmason/topolvm by deleted user a year ago
- Don't re-add finalizer to LogicalVolumes that are about to be deleted This fixes a race-condition between node_controller and logicalvolume_controller where node_controller removes the finalizer from... — committed to spmason/topolvm by deleted user a year ago
- Don't re-add finalizer to LogicalVolumes that are about to be deleted This fixes a race-condition between node_controller and logicalvolume_controller where node_controller removes the finalizer from... — committed to spmason/topolvm by deleted user a year ago
- Don't re-add finalizer to LogicalVolumes that are about to be deleted This fixes a race-condition between node_controller and logicalvolume_controller where node_controller removes the finalizer from... — committed to spmason/topolvm by deleted user a year ago
Thanks @peng225 . I’ve opened https://github.com/topolvm/topolvm/pull/723 which I believe implements your spec
I have noticed this in the
topolvm-nodelogs on the node I’m deleting:So it’s trying to add the finalizer to the LV? I guess if this were to succeed then that could cause this problem I’m seeing where the finalizer isn’t cleaned up? The sequence of events being:
The fix then might be to delete the LV before removing the finalizer in the controller? The LV controller guards around
DeletionTimestampbeingnil, so that should work?Or am I completely misunderstanding what’s happening here?
Right, I see. And the LVs are deleted too. It also seems to be removing the finalizer on the LV itself? Got to admit that my Go knowledge isn’t particularly strong but I think that’s what
cleanupLogicalVolumesis trying to do here?https://github.com/topolvm/topolvm/blob/v0.18.1/controllers/node_controller.go#L167-L182
So even if topolvm-node isn’t running then this code should be removing the finalizer on the LV and allowing the deletion - maybe something is wrong there?
Looking at the node_controller logs from around the time of a delete that orphaned an LV, the only log line I see is
deleted LogicalVolume. The LV didn’t get deleted but it looks like it at least tried - but the finalizer was left on it, which implies there’s something wrong with the remove-finalizer code incleanupLogicalVolumes? Or maybe it was recreated by something?One last thing I noticed - it looks like that controller should also log when it deletes the PVC, and it didn’t, so maybe that was cleaned up elsewhere? I don’t know if that’s a problem at all
I’ve added some more logging to that method anyway and deployed to my test cluster, hopefully that will give us some clues
I’m not using
--skip-node-finalize- I’m using a lot of ephemeral volumes in my deployments and that would exacerbate this issue massively.The majority of the LogicalVolumes are marked as deleted, but the finalizer is preventing them from being cleaned up.
Do you mean the
topolvm-node? That’s what the docs say, and my logs from the time of the delete imply that tooLogs from
topolvm-controlleraround a failed deletion aren’t very interesting - just retries waiting for the volume to delete:The
topolvm-nodelogs are more interesting:It retries this until the node is taken out (leaving the LV stranded I assume)
The logs for
topolvm-lvmdare also relevant as tha seems to be what’s returningexit status 5:Is this expected behaviour? As it retries I assume the
lvremovecommand would eventually succeed once the pod has been removed from the node, however the node gets deleted soon after that so perhaps that’s why lvmd never gets to that point?