longhorn: [BUG] share-manager pod failed to restart after kubelet restart

Describe the bug (šŸ› if you encounter this issue)

Related to case 4 of https://longhorn.github.io/longhorn-tests/manual/pre-release/node/kubelet-restart-on-a-node/.

After executed sudo docker restart kubelet on the node where share-manager pod is running, share-manager pod could fail to restart with the following error:

kubectl describe pod share-manager-pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 -n longhorn-system
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  29s               default-scheduler  Successfully assigned longhorn-system/share-manager-pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 to yang-worker-0
  Normal   Pulled     29s               kubelet            Container image "longhornio/longhorn-share-manager:v1.4.1-rc2" already present on machine
  Normal   Created    29s               kubelet            Created container share-manager
  Normal   Started    29s               kubelet            Started container share-manager
  Warning  Unhealthy  5s (x5 over 25s)  kubelet            Readiness probe failed: cat: /var/run/ganesha.pid: No such file or directory
kubectl logs share-manager-pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 -n longhorn-syste-f 
time="2023-03-08T03:42:35Z" level=info msg="starting RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576"
time="2023-03-08T03:42:35Z" level=info msg="Ending RLIMIT_NOFILE rlimit.Cur 1048576, rlimit.Max 1048576"
time="2023-03-08T03:42:35Z" level=debug msg="Volume pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 device /dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 contains filesystem of format " encrypted=false volume=pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1
I0308 03:42:35.410565       1 mount_linux.go:449] Disk "/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1]
time="2023-03-08T03:45:21Z" level=warning msg="Failed to mount volume" encrypted=false error="format of disk \"/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\" failed: type:(\"ext4\") target:(\"/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\") options:(\"defaults\") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)\nWarning: could not erase sector 2: Input/output error\nCreating filesystem with 262144 4k blocks and 65536 inodes\nFilesystem UUID: e8c869eb-c9cd-4a17-aede-47d0ea021c21\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376\n\nAllocating group tables: 0/8\b\b\b   \b\b\bdone                            \nWarning: could not read block 0: Input/output error\nWarning: could not erase sector 0: Input/output error\nWriting inode tables: 0/8\b\b\b1/8\b\b\b   \b\b\bdone                            \nCreating journal (8192 blocks): done\nWriting superblocks and filesystem accounting information: 0/8\b\b\bmkfs.ext4: Input/output error while writing out and closing file system\n) " volume=pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1
E0308 03:45:21.525843       1 mount_linux.go:455] format of disk "/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1" failed: type:("ext4") target:("/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)
Warning: could not erase sector 2: Input/output error
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: e8c869eb-c9cd-4a17-aede-47d0ea021c21
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376

Allocating group tables: done                            
Warning: could not read block 0: Input/output error
Warning: could not erase sector 0: Input/output error
Writing inode tables: done                            
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
) 
W0308 03:45:21.606039       1 mount_helper_common.go:133] Warning: "/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1" is not a mountpoint, deleting
time="2023-03-08T03:45:21Z" level=debug msg="Device /dev/mapper/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1 is not an active LUKS device" error="failed to run cryptsetup args: [status pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1] output:  error: exit status 4"
time="2023-03-08T03:45:21Z" level=fatal msg="Error running start command: format of disk \"/dev/longhorn/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\" failed: type:(\"ext4\") target:(\"/export/pvc-3c6339d5-2dd9-4ae3-ac1f-18cfe32e94a1\") options:(\"defaults\") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)\nWarning: could not erase sector 2: Input/output error\nCreating filesystem with 262144 4k blocks and 65536 inodes\nFilesystem UUID: e8c869eb-c9cd-4a17-aede-47d0ea021c21\nSuperblock backups stored on blocks: \n\t32768, 98304, 163840, 229376\n\nAllocating group tables: 0/8\b\b\b   \b\b\bdone                            \nWarning: could not read block 0: Input/output error\nWarning: could not erase sector 0: Input/output error\nWriting inode tables: 0/8\b\b\b1/8\b\b\b   \b\b\bdone                            \nCreating journal (8192 blocks): done\nWriting superblocks and filesystem accounting information: 0/8\b\b\bmkfs.ext4: Input/output error while writing out and closing file system\n) ."

To Reproduce

Steps to reproduce the behavior:

  1. Setup v1.24.10-rancher4-1 RKE cluster with 1 control plane and 3 worker nodes
  2. Install Longhorn v1.4.1-rc2/rc1
  3. Create the following statefulset with a RWX volume:
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-statefulset-rwx
  namespace: default
spec:
  selector:
    matchLabels:
      app: test-statefulset-rwx
  serviceName: test-statefulset-rwx
  replicas: 1
  template:
    metadata:
      labels:
        app: test-statefulset-rwx
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - image: busybox
          imagePullPolicy: IfNotPresent
          name: sleep
          args: ['/bin/sh', '-c', 'while true;do date;sleep 5; done']
          volumeMounts:
            - name: pod-data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: pod-data
      spec:
        accessModes: ['ReadWriteMany']
        storageClassName: 'longhorn'
        resources:
          requests:
            storage: 1Gi
  1. Restart the kubelet on the node where the share-manager pod is running using the command sudo docker restart kubelet
  2. share-manager pod could migrate to other nodes, just keep restart the kubelet on the new node
  3. Eventually share-manager pod will fail to restart with the above errors. It should be able to reproduce in < 5 times

Expected behavior

A clear and concise description of what you expected to happen.

Log or Support bundle

If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version: v1.4.1-rc2/rc1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE v1.24.10-rancher4-1
    • Number of management node in the cluster:
    • Number of worker node in the cluster:
  • Node config
    • OS type and version:
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 31 (30 by maintainers)

Most upvoted comments

@yangchiu Can you try the customized longhorn-manager image derekbit/longhorn-manager:rwx-delete-engine which contains the patch longhorn/longhorn-manager#1757? Thank you.

Use derekbit/longhorn-manager:rwx-delete-engine and restart kubelet for more than 10 times, currently doesn’t hit this issue.

@yangchiu Can you help check if the stop kubelet and recreating the share-manager pod on another still hit the issue? Thank you.

Yes, I use docker stop kubelet instead of dokcer restart kubelet to force share-manager pod being recreated on another node, and still hit this issue.

@yangchiu Can you try the customized longhorn-manager image derekbit/longhorn-manager:rwx-delete-engine which contains the patch https://github.com/longhorn/longhorn-manager/pull/1757? Thank you.

Just clarified with David, the issue is unrelated to the lifecycle control of ShareManager and SharManager pod. We should focus on the pod recreation stuck part.

It is the controller rather than the share manager pod being responsible for the volume attachment/detachment.
When you check the pod spec, you can see that the pod would not use PVC or something else. It just directly mounts the host /dev then uses the block device. https://github.com/longhorn/longhorn-manager/blob/d7263a84153468d5d7b68adf51fdf921c5d397a9/controller/share_manager_controller.go#L867

So including 1.4.0.

@derekbit why were you able to reproduce this in the end? Were any specific steps missed initially?

I just tried more times… Actually, my reproduce rate is 1/20 - 1/30…

According to @yangchiu’s comment, it looks only happen in v1.4.0+.

Thanks @yangchiu for checking the issue together. I can reproduce it now but still don’t know the root cause. The volume is back to normal in the end without any data loss after several minutes…

Happen in v1.3.x and master-head as well? Looks the filesystem is somehow corrupted.

It can be reproduced in master-head. But for v1.3.x-head, currently unable to reproduce.