longhorn: [BUG] orphaned pod pod_id found, but error not a directory occurred when trying to remove the volumes dir
Describe the bug After unclean node shutdown and probably some other cases, kubelet fails to remove orphaned pod with longhorn PVC
To Reproduce
- Deploy k3s
- Deploy longhorn
- Deploy pod with longhorn volume
- crash node with pod running / kill k3s
- Observe the logs
Expected behavior No log spam should appear
Log
k3s[471]: E1102 15:11:01.933125 471 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"5a5fd1bf-bc3c-4600-88d3-321701d3d95a\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2
Environment:
- Longhorn version: 1.2.2
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s
- Number of management node in the cluster:3
- Number of worker node in the cluster:4
Additional context I’m not quite sure if this kubelet or longhorn issue. Issue itself is caused due to how kubelet manages orphaned pods cleanup - looks like it calls rmdir and if something is inside said dir it fails to drop directory. In case of longhorn-provisioned volumes, inability to cleanup orphaned pods directories is caused by
vol_data.json
file, in my case containing
root@master-1:~# cat /var/lib/kubelet/pods/47a9ab25-68e9-4c8a-ab29-a3b3a0500799/volumes/kubernetes.io~csi/pvc-5232ca8d-b13f-42eb-8088-ec08cee51a7a/vol_data.json
{"attachmentID":"csi-7b27560c7ac7e905a2f097d3caef3b16ad78e7e138b1b4c6d5c427a4066e9729","driverName":"driver.longhorn.io","nodeName":"master-1","specVolID":"pvc-5232ca8d-b13f-42eb-8088-ec08cee51a7a","volumeHandle":"pvc-5232ca8d-b13f-42eb-8088-ec08cee51a7a","volumeLifecycleMode":"Persistent"}
Removing that dangling file allows kubelet to proceed with orphaned pod cleanup.
There is also this https://github.com/longhorn/longhorn/issues/3080 ticket but seems those are caused by different cases.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 30 (8 by maintainers)
I ended up using this script (not I’m not a longhorn dev, I cannot guarantee this is safe)
Two options:
“You don’t need to delete the dangling vol_data.json in the old mountpoint directory by manual. After restart of the longhorn-csi plugin automatically and waiting for several minutes, the pod with a new volume mountpoint will be at Running state again.” - in case crashed replica is rescheduled on same node.
But If replica has been rescheduled to different node and you have dangling vol_data.json you need to go to
/var/lib/kubelet/pods/$pod_id/volumes/kubernetes.io~csi/pvc_$pvc_id/and delete vol_data.json after making sure this is not “live” volume.As Derek said, this is not longhorn bug, that’s how kubelet is handling folder cleanup (calling rmdir will always fail if this directory contains anything)
I can confirm this bug running v1.2.3. I spotted it on my environment after my k3s master was powered down by mistake, while the nodes were still were running.
k3s[797]: E0123 21:15:24.559660 797 kubelet_volumes.go:245] “There were many similar errors. Turn up verbosity to see them.” err=“orphaned pod "7316f31a-e2cb-4d92-8667-46ba6b610228" found, but error not a directory occurred when trying to remove the volumes dir” numErrs=1
After removing the vol_data.json file manually, the system recovered by itself.
k3s[797]: I0123 21:15:26.597356 797 kubelet_volumes.go:160] “Cleaned up orphaned pod volumes dir” podUID=7316f31a-e2cb-4d92-8667-46ba6b610228 path=“/var/lib/kubelet/pods/7316f31a-e2cb-4d92-8667-46ba6b610228/volumes”
Also seeing this error on microk8s 1.26.4 and Longhorn 1.4.1 it was flooding my kubelite syslogs.
No idea how it started, I believe it happened after I rebooted the nodes. Deleted one of these “orphaned pod” and another shows up instead. Ended up having to delete about 15. On all nodes.
Would be interesting to have a more robust solution than running a script that delete files manually. Why does this happen, how can we prevent it from happening?
Had to remove the
/*after volumes/ in @alexnederlof script above otherwise it would not pass the “if” check.managed to adapt the script, here’s my take, feel free to improve:
i tried to target the problematic file itself, so kubelet takes care of the rest
@shuo-wu Yes. But the CSI plugin can restart automatically after the restart of the node and kubelet now, so we don’t need to do any change. The influence of vol_data.json in the restart node’s old volume is the repeated error log messages. It should be fixed in kubelet.
@rlex
Sorry for being late reply. I can reproduce the issue by increasing the power outage period.
I noticed the error messages in the reboot node’ kubelet log.
The root cause is that the connection to CSI driver was broken, so the remove of vol_data.json in UnmountVolume.TearDown cannot be executed and lead to the Rmdir failure of the old volume mountpoint.
So, there were lots of error messages in the kubelet log.
You don’t need to delete the dangling vol_data.json in the old mountpoint directory by manual. After restart of the longhorn-csi plugin automatically and waiting for several minutes, the pod with a new volume mountpoint will be at Running state again.
The issue is related to the kubelet logic and does not impact on the usage of Longhorn.
Related to https://github.com/kubernetes/kubernetes/issues/105536
I’ve been trying to get this script to work for my setup using crontab for a while now (my coding isnt the best) that wouldnt infinite loop but still loop thru to get all of the orphaned pods before exiting. I think I finally got it. Sharing for anyone that wants to schedule this rather than have it constantly running.
@migs35323 I like your “let kubelet handle itself” approach but your if statement was throwing “binary operator expected” errors for me when a pod had >1 pvc. I had to change it to look at a directory and then appended the file to delete.
If anyone has suggestions to make this more efficient, Im all ears.
@weizhe0422 could you help with this knowledge base? please check with @derekbit if having any questions.