longhorn: [BUG] share-manager pod failed to restart after kubelet restart
Describe the bug (š if you encounter this issue)
Related to case 4 of https://longhorn.github.io/longhorn-tests/manual/pre-release/node/kubelet-restart-on-a-node/.
After executed sudo docker restart kubelet on the node where share-manager pod is running, share-manager pod could fail to restart with the following error:
kubectl describe pod share-manager-pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 -n longhorn-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 29s default-scheduler Successfully assigned longhorn-system/share-manager-pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 to yang-worker-0
Normal Pulled 29s kubelet Container image "longhornio/longhorn-share-manager:v1.4.1-rc2" already present on machine
Normal Created 29s kubelet Created container share-manager
Normal Started 29s kubelet Started container share-manager
Warning Unhealthy 5s (x5 over 25s) kubelet Readiness probe failed: cat: /var/run/ganesha.pid: No such file or directory
kubectl logs share-manager-pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 -n longhorn-syste-f
time="2023-03-08T03:42:35Z" level=info msg="starting RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576"
time="2023-03-08T03:42:35Z" level=info msg="Ending RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576"
time="2023-03-08T03:42:35Z" level=debug msg="Volume pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 device /dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 contains filesystem of format " encrypted=false volume=pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1
I0308 03:42:35.410565 1 mount_linux.go:449] Disk "/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1]
time="2023-03-08T03:45:21Z" level=warning msg="Failed to mount volume" encrypted=false error="format of disk \"/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\" failed: type:(\"ext4\") target:(\"/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\") options:(\"defaults\") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)\nWarning: could not erase sector 2: Input/output error\nCreating filesystem with 262144 4k blocks and 65536 inodes\nFilesystem UUID: e8c869eb-c9cd-4a17-aede-47d0ea021c21\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376\n\nAllocating group tables: 0/8\b\b\b \b\b\bdone \nWarning: could not read block 0: Input/output error\nWarning: could not erase sector 0: Input/output error\nWriting inode tables: 0/8\b\b\b1/8\b\b\b \b\b\bdone \nCreating journal (8192 blocks): done\nWriting superblocks and filesystem accounting information: 0/8\b\b\bmkfs.ext4: Input/output error while writing out and closing file system\n) " volume=pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1
E0308 03:45:21.525843 1 mount_linux.go:455] format of disk "/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1" failed: type:("ext4") target:("/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)
Warning: could not erase sector 2: Input/output error
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: e8c869eb-c9cd-4a17-aede-47d0ea021c21
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Allocating group tables: done
Warning: could not read block 0: Input/output error
Warning: could not erase sector 0: Input/output error
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
)
W0308 03:45:21.606039 1 mount_helper_common.go:133] Warning: "/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1" is not a mountpoint, deleting
time="2023-03-08T03:45:21Z" level=debug msg="Device /dev/mapper/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 is not an active LUKS device" error="failed to run cryptsetup args: [status pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1] output: error: exit status 4"
time="2023-03-08T03:45:21Z" level=fatal msg="Error running start command: format of disk \"/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\" failed: type:(\"ext4\") target:(\"/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\") options:(\"defaults\") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)\nWarning: could not erase sector 2: Input/output error\nCreating filesystem with 262144 4k blocks and 65536 inodes\nFilesystem UUID: e8c869eb-c9cd-4a17-aede-47d0ea021c21\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376\n\nAllocating group tables: 0/8\b\b\b \b\b\bdone \nWarning: could not read block 0: Input/output error\nWarning: could not erase sector 0: Input/output error\nWriting inode tables: 0/8\b\b\b1/8\b\b\b \b\b\bdone \nCreating journal (8192 blocks): done\nWriting superblocks and filesystem accounting information: 0/8\b\b\bmkfs.ext4: Input/output error while writing out and closing file system\n) ."
To Reproduce
Steps to reproduce the behavior:
- Setup
v1.24.10-rancher4-1RKE cluster with 1 control plane and 3 worker nodes - Install Longhorn v1.4.1-rc2/rc1
- Create the following statefulset with a RWX volume:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: test-statefulset-rwx
namespace: default
spec:
selector:
matchLabels:
app: test-statefulset-rwx
serviceName: test-statefulset-rwx
replicas: 1
template:
metadata:
labels:
app: test-statefulset-rwx
spec:
terminationGracePeriodSeconds: 10
containers:
- image: busybox
imagePullPolicy: IfNotPresent
name: sleep
args: ['/bin/sh', '-c', 'while true;do date;sleep 5; done']
volumeMounts:
- name: pod-data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: pod-data
spec:
accessModes: ['ReadWriteMany']
storageClassName: 'longhorn'
resources:
requests:
storage: 1Gi
- Restart the kubelet on the node where the
share-managerpod is running using the commandsudo docker restart kubelet share-managerpod could migrate to other nodes, just keep restart the kubelet on the new node- Eventually
share-managerpod will fail to restart with the above errors. It should be able to reproduce in < 5 times
Expected behavior
A clear and concise description of what you expected to happen.
Log or Support bundle
If applicable, add the Longhorn managersā log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Environment
- Longhorn version: v1.4.1-rc2/rc1
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE v1.24.10-rancher4-1
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
Add any other context about the problem here.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 31 (30 by maintainers)
Use
derekbit/longhorn-manager:rwx-delete-engineand restartkubeletfor more than 10 times, currently doesnāt hit this issue.Yes, I use
docker stop kubeletinstead ofdokcer restart kubeletto force share-manager pod being recreated on another node, and still hit this issue.@yangchiu Can you try the customized longhorn-manager image
derekbit/longhorn-manager:rwx-delete-enginewhich contains the patch https://github.com/longhorn/longhorn-manager/pull/1757? Thank you.Just clarified with David, the issue is unrelated to the lifecycle control of ShareManager and SharManager pod. We should focus on the pod recreation stuck part.
It is the controller rather than the share manager pod being responsible for the volume attachment/detachment.
When you check the pod spec, you can see that the pod would not use PVC or something else. It just directly mounts the host
/devthen uses the block device. https://github.com/longhorn/longhorn-manager/blob/d7263a84153468d5d7b68adf51fdf921c5d397a9/controller/share_manager_controller.go#L867I just tried more times⦠Actually, my reproduce rate is 1/20 - 1/30ā¦
According to @yangchiuās comment, it looks only happen in v1.4.0+.
Thanks @yangchiu for checking the issue together. I can reproduce it now but still donāt know the root cause. The volume is back to normal in the end without any data loss after several minutesā¦
It can be reproduced in master-head. But for v1.3.x-head, currently unable to reproduce.