topolvm: LogicalVolumes "leak" and slow down provisioning

Describe the bug

LogicalVolume objects tied to deleted nodes are not cleaned up and cannot be cleaned up, because the finalizer field is cleared by the topolvm-node on the node itself (which is no longer running).

Over time this results in the number of LV objects building up in Kubernetes. The controller is constantly trying to process these and that leads to a general slowdown in that and thus in the provisioning of new volumes (as it’s spending a lot of time/threads trying to delete LVs which can’t be deleted)

According to the docs:

LogicalVolume is created with a finalizer. When a LogicalVolume is being deleted, topolvm-node on the target node deletes the corresponding LVM logical volume and clears the finalizer.

I’ve seen this happening on two of my own clusters today: one had 400+ stale LVs and another 1200+, built up over a year or so (through multiple topolvm upgrades but there were stale LVs created today - I upgraded to 0.18.1 weeks ago). On the larger cluster it was taking many hours to do a full rolling redeploy of 170 pods over 22 deployments

Note that LVs actually seem to get into two states:

  1. They are marked for deletion (have a deletionTimestamp), but the finalizer blocks it
  2. They belong to an invalid node (.spec.nodeName is no longer a valid node) but are not marked for deletion (and still have the finalizer, so once they’re deleted 1 will become an issue)

Environments

  • Version: 0.18.1
  • OS: EKS v1.23 / 1.24

To Reproduce Steps to reproduce the behavior:

I haven’t figured out a surefire way to reproduce this - I can delete a node and the topolvm-node process usually seems to pick that up. I suspect that given certain conditions that process is either slow or the pod isn’t running or gets terminated too quickly, and it doesn’t get to removing the finalizer

Expected behavior If the node that a LogicalVolume belongs to no longer exists, the controller should remove the finalizer itself before asking for its deletion

Additional context

These two scripts tidy up the LVs for me, in the short term I’ll keep running them on an ad-hoc basis:

  1. Delete LVs tied to non-existant nodes:
kubectl get logicalvolumes -ojson | jq -c .items[] | grep -vf <(kubectl get no -ojson | jq .items[].metadata.name -r) | jq -r .metadata.name | xargs kubectl delete logicalvolume
  1. Remove finalizer on deleted LVs to allow deletion to complete:
kubectl get LogicalVolume -ojson | jq .items[] -c | grep deletionTimestamp | jq -r .metadata.name | xargs -I{} kubectl patch LogicalVolume {} -p '{"metadata":{"finalizers":null}}' --type merge

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks @peng225 . I’ve opened https://github.com/topolvm/topolvm/pull/723 which I believe implements your spec

I have noticed this in the topolvm-node logs on the node I’m deleting:

{"level":"error","ts":1688722585.7012916,"msg":"failed to add finalizer","controller":"logicalvolume","controllerGroup":"topolvm.cybozu.com","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-42f87fd2-7ef0-481d-a773-92ad2a996246"},"namespace":"","name":"pvc-42f87fd2-7ef0-481d-a773-92ad2a996246","reconcileID":"18323ef2-74c2-4136-b61a-8c0cad8439ab","name":"pvc-42f87fd2-7ef0-481d-a773-92ad2a996246","error":"logicalvolumes.topolvm.cybozu.com \"pvc-42f87fd2-7ef0-481d-a773-92ad2a996246\" not found","stacktrace":"github.com/topolvm/topolvm/controllers.(*LogicalVolumeReconciler).Reconcile\n\t/workdir/controllers/logicalvolume_controller.go:68\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234"}

So it’s trying to add the finalizer to the LV? I guess if this were to succeed then that could cause this problem I’m seeing where the finalizer isn’t cleaned up? The sequence of events being:

  1. Node is deleted
  2. Node controller runs, removes LV finalizer
  3. LV controller runs & re-adds finalizer
  4. Node controller deletes the LV
  5. Node controller removes node finalizer, node is deleted (before topolvm-node deletes the volume)

The fix then might be to delete the LV before removing the finalizer in the controller? The LV controller guards around DeletionTimestamp being nil, so that should work?

Or am I completely misunderstanding what’s happening here?

In r.doFinalize, PVCs are deleted.

Right, I see. And the LVs are deleted too. It also seems to be removing the finalizer on the LV itself? Got to admit that my Go knowledge isn’t particularly strong but I think that’s what cleanupLogicalVolumes is trying to do here?

https://github.com/topolvm/topolvm/blob/v0.18.1/controllers/node_controller.go#L167-L182

So even if topolvm-node isn’t running then this code should be removing the finalizer on the LV and allowing the deletion - maybe something is wrong there?

Looking at the node_controller logs from around the time of a delete that orphaned an LV, the only log line I see is deleted LogicalVolume. The LV didn’t get deleted but it looks like it at least tried - but the finalizer was left on it, which implies there’s something wrong with the remove-finalizer code in cleanupLogicalVolumes? Or maybe it was recreated by something?

One last thing I noticed - it looks like that controller should also log when it deletes the PVC, and it didn’t, so maybe that was cleaned up elsewhere? I don’t know if that’s a problem at all

I’ve added some more logging to that method anyway and deployed to my test cluster, hopefully that will give us some clues

I’m not using --skip-node-finalize - I’m using a lot of ephemeral volumes in my deployments and that would exacerbate this issue massively.

The majority of the LogicalVolumes are marked as deleted, but the finalizer is preventing them from being cleaned up.

And then the finalizer on the node is deleted by the topolvm-controller

Do you mean the topolvm-node? That’s what the docs say, and my logs from the time of the delete imply that too

Logs from topolvm-controller around a failed deletion aren’t very interesting - just retries waiting for the volume to delete:

2023-07-03T10:35:04Z    {"log":"{\"level\":\"info\",\"ts\":1688380504.2996945,\"msg\":\"deleted LogicalVolume\",\"controller\":\"node\",\"controllerGroup\":\"\",\"controllerKind\":\"Node\",\"Node\":{\"name\":\"ip-10-237-50-86.eu-west-1.compute.internal\"},\"namespace\":\"\",\"name\":\"ip-10-237-50-86.eu-west-1.compute.internal\",\"reconcileID\":\"c7b62af3-31f2-48a0-8a08-30ba6179c1c3\",\"name\":\"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\"}\n","stream":"stderr","time":"2023-07-03T10:35:04.299846531Z"}
2023-07-03T10:36:38Z    {"log":"I0703 10:36:38.770405       1 controller.go:1471] delete \"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\": started\n","stream":"stderr","time":"2023-07-03T10:36:38.771089892Z"}
2023-07-03T10:36:48Z    {"log":"E0703 10:36:48.771216       1 controller.go:1481] delete \"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded\n","stream":"stderr","time":"2023-07-03T10:36:48.771511115Z"}
2023-07-03T10:36:48Z    {"log":"W0703 10:36:48.771262       1 controller.go:989] Retrying syncing volume \"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\", failure 0\n","stream":"stderr","time":"2023-07-03T10:36:48.771537007Z"}
2023-07-03T10:36:48Z    {"log":"E0703 10:36:48.771289       1 controller.go:1007] error syncing volume \"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\": rpc error: code = DeadlineExceeded desc = context deadline exceeded\n","stream":"stderr","time":"2023-07-03T10:36:48.771543101Z"}
2023-07-03T10:36:48Z    {"log":"I0703 10:36:48.771351       1 event.go:285] Event(v1.ObjectReference{Kind:\"PersistentVolume\", Namespace:\"\", Name:\"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\", UID:\"05143ef3-c9b4-4de0-840f-d3458e585e76\", APIVersion:\"v1\", ResourceVersion:\"482846244\", FieldPath:\"\"}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = DeadlineExceeded desc = context deadline exceeded\n","stream":"stderr","time":"2023-07-03T10:36:48.771547288Z"}
2023-07-03T10:36:49Z    {"log":"I0703 10:36:49.772253       1 controller.go:1471] delete \"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\": started\n","stream":"stderr","time":"2023-07-03T10:36:49.772355598Z"}
2023-07-03T10:36:59Z    {"log":"E0703 10:36:59.772920       1 controller.go:1481] delete \"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5\": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded\n","stream":"stderr","time":"2023-07-03T10:36:59.773028391Z"}

The topolvm-node logs are more interesting:

2023-07-03T10:35:04Z    {"level":"info","ts":1688380504.2968848,"msg":"start finalizing LogicalVolume","controller":"logicalvolume","controllerGroup":"topolvm.cybozu.com","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5"},
"namespace":"","name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5","reconcileID":"48034552-095d-4460-b9b4-e64bb8336445","name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5"}
2023-07-03T10:35:04Z    {"level":"error","ts":1688380504.703414,"msg":"failed to remove LV","controller":"logicalvolume","controllerGroup":"topolvm.cybozu.com","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5"},"namespace"
:"","name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5","reconcileID":"48034552-095d-4460-b9b4-e64bb8336445","name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5","uid":"e4c11caa-c4df-47a9-a65f-9412da214dfb","error":"rpc error: code = Internal desc = exit status 5","stacktrace"
:"github.com/topolvm/topolvm/controllers.(*LogicalVolumeReconciler).removeLVIfExists\n\t/workdir/controllers/logicalvolume_controller.go:151\ngithub.com/topolvm/topolvm/controllers.(*LogicalVolumeReconciler).Reconcile\n\t/workdir/controllers/logicalvolume_controller.go:11
0\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/
pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:2
73\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234"}
2023-07-03T10:35:04Z    {"level":"error","ts":1688380504.7035918,"msg":"Reconciler error","controller":"logicalvolume","controllerGroup":"topolvm.cybozu.com","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5"},"namespace":"
","name":"pvc-0ace98d1-7a32-4a08-81cb-6350294bbfa5","reconcileID":"48034552-095d-4460-b9b4-e64bb8336445","error":"rpc error: code = Internal desc = exit status 5","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pk
g/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273
\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:234"}

It retries this until the node is taken out (leaving the LV stranded I assume)

The logs for topolvm-lvmd are also relevant as tha seems to be what’s returning exit status 5:

2023-07-03T10:35:49Z    2023-07-03T10:35:49.451664Z topolvm-lvmd-0-x75dg lvmd info: "invoking LVM command" args="[fullreport --reportformat json --units b --nosuffix --configreport vg -o vg_name,vg_uuid,vg_size,vg_free --configreport lv -o lv_uuid,lv_name,lv_full_name,lv_path,lv_size,lv_kernel_major,lv_kernel_minor,origin,origin_size,pool_lv,lv_tags,lv_attr,vg_name,data_percent,metadata_percent,pool_lv --configreport pv -o, --configreport pvseg -o, --configreport seg -o,]"
2023-07-03T10:35:49Z    2023-07-03T10:35:49.559212Z topolvm-lvmd-0-x75dg lvmd info: "invoking LVM command" args="[lvremove -f /dev/localnvme/e4c11caa-c4df-47a9-a65f-9412da214dfb]"
2023-07-03T10:35:49Z      Logical volume localnvme/e4c11caa-c4df-47a9-a65f-9412da214dfb contains a filesystem in use.
2023-07-03T10:35:49Z    2023-07-03T10:35:49.655171Z topolvm-lvmd-0-x75dg lvmd error: "failed to remove volume" error="exit status 5" name="e4c11caa-c4df-47a9-a65f-9412da214dfb"lv_uuid,lv_name,lv_full_name,lv_path,lv_size,lv_kernel_major,lv_kernel_minor,origin,origin_size,pool_lv,lv_tags,lv_attr,vg_name,data_percent,metadata_percent,pool_lv --configreport pv -o, --configreport pvseg -o, --configreport seg -o,]"
2023-07-03T10:36:30Z    2023-07-03T10:36:30.623214Z topolvm-lvmd-0-x75dg lvmd info: "invoking LVM command" args="[fullreport --reportformat json --units b --nosuffix --configreport vg -o vg_name,vg_uuid,vg_size,vg_free --configreport lv -o lv_uuid,lv_name,lv_full_name,lv_path,lv_size,lv_kernel_major,lv_kernel_minor,origin,origin_size,pool_lv,lv_tags,lv_attr,vg_name,data_percent,metadata_percent,pool_lv --configreport pv -o, --configreport pvseg -o, --configreport seg -o,]"
2023-07-03T10:36:30Z    2023-07-03T10:36:30.703500Z topolvm-lvmd-0-x75dg lvmd info: "invoking LVM command" args="[fullreport --reportformat json --units b --nosuffix --configreport vg -o vg_name,vg_uuid,vg_size,vg_free --configreport lv -o lv_uuid,lv_name,lv_full_name,lv_path,lv_size,lv_kernel_major,lv_kernel_minor,origin,origin_size,pool_lv,lv_tags,lv_attr,vg_name,data_percent,metadata_percent,pool_lv --configreport pv -o, --configreport pvseg -o, --configreport seg -o,]"
2023-07-03T10:36:30Z    2023-07-03T10:36:30.795183Z topolvm-lvmd-0-x75dg lvmd info: "invoking LVM command" args="[lvremove -f /dev/localnvme/e4c11caa-c4df-47a9-a65f-9412da214dfb]"
2023-07-03T10:36:30Z      Logical volume localnvme/e4c11caa-c4df-47a9-a65f-9412da214dfb contains a filesystem in use.
2023-07-03T10:36:30Z    2023-07-03T10:36:30.871072Z topolvm-lvmd-0-x75dg lvmd error: "failed to remove volume" error="exit status 5" name="e4c11caa-c4df-47a9-a65f-9412da214dfb"

Is this expected behaviour? As it retries I assume the lvremove command would eventually succeed once the pod has been removed from the node, however the node gets deleted soon after that so perhaps that’s why lvmd never gets to that point?