longhorn: [BUG] Reboot node while volume expansion, will cause pod stuck at creating state
Describe the bug (š if you encounter this issue)
Reboot node while volume expansion, will cause pod stuck at creating state Can reproduce without https://github.com/longhorn/longhorn-manager/commit/3e27dc3561395dd9cab8c59c13618c564329fa59 from @derekbit
To Reproduce
Steps to reproduce the behavior:
- Deploy Longhorn v1.4.x
- Dynamic provision volume by statefulset (1 pod replica)
- Write data into pod mount point
- Edit PVC volume size trigger volume online expansion
- Reboot volume attached node while volume expansion
- After node up, volume healthy but pod stuck at creating
Expected behavior
Pod should come up and can read data
Log or Support bundle
supportbundle_d77756d5-0aed-4bc9-8609-47068143430f_2022-12-29T11-03-39Z.zip
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m1s default-scheduler Successfully assigned default/web-0 to ip-172-31-95-56
Warning FailedMount 102s kubelet MountVolume.Setup failed while expanding volume for volume "pvc-60603d49-b693-4171-b7bb-c24a64ccf0a2" : Expander.NodeExpand failed to expand the volume : rpc error: code = Internal desc = failed to read size of filesystem on /dev/longhorn/pvc-60603d49-b693-4171-b7bb-c24a64ccf0a2: exit status 152: dumpe2fs 1.46.4 (18-Aug-2021)
dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/longhorn/pvc-60603d49-b693-4171-b7bb-c24a64ccf0a2
Filesystem volume name: <none>
Last mounted on: /usr/share/nginx/html
Filesystem UUID: 2c5e1c0d-0ec4-4e34-bce9-d66661444e68
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 720896
Block count: 2883584
Reserved block count: 0
Overhead clusters: 54714
Free blocks: 2828863
Free inodes: 720884
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 126
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Thu Dec 29 09:40:08 2022
Last mount time: Thu Dec 29 09:46:12 2022
Last write time: Thu Dec 29 09:46:12 2022
Mount count: 2
Maximum mount count: -1
Last checked: Thu Dec 29 09:40:08 2022
Check interval: 0 (<none>)
Lifetime writes: 565 kB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: a5544af0-bd43-4953-bbf2-839aed407674
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0xfddfd606
Journal features: journal_64bit journal_checksum_v3
Total journal size: 32M
Total journal blocks: 8192
Max transaction length: 8192
Fast commit length: 0
Journal sequence: 0x00000008
Journal start: 0
Journal checksum type: crc32c
Journal checksum: 0x2b7e1b86
vents:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 89s default-scheduler Successfully assigned default/web-0 to ip-172-31-81-141
Warning FailedMount 70s kubelet MountVolume.MountDevice failed for volume "pvc-244c68d4-0caf-4443-b683-97b4e0c9284d" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-244c68d4-0caf-4443-b683-97b4e0c9284d /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/2602be959344fdd1c281ef2e078352c3e9f72c8e1a92f3cac0987f564a2d385a/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/2602be959344fdd1c281ef2e078352c3e9f72c8e1a92f3cac0987f564a2d385a/globalmount: cannot mount; probably corrupted filesystem on /dev/longhorn/pvc-244c68d4-0caf-4443-b683-97b4e0c9284d.
Environment
- Longhorn version: v1.4.x head
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s 1.24
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): AWS
- Number of Longhorn volumes in the cluster:
Additional context
Disconnect node connection can reproduce the same error
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 19 (19 by maintainers)
To avoid this issue was caused by environment, tested in a fresh environment with v1.4.x head images deployed, did 3 times test with steps, there were 2 times can reproduce this error.
Hi @shuo-wu , after using storage class with parameter
-O ^metadata_csumadded below, I can not reproduce volume corruption state@innobead Right now the conclusion is, we can not directly fix this issue and the filesystem corruption may be hard to fix (I didnāt reproduce this case). But as I mentioned above, disabling
metadata_csumcan bypass this issue. Now I prefer to write a KB to inform this case and all workarounds (manualfsckor disablingmetadata_csumat the beginning), and the upstream PR link, then close this ticket.I tried do restore the superblock by steps and the volume still not get repaired, below was the reply when execute
sudo e2fsck -b block_number /dev/xxxIn addition, calculated md5sum of volume head image and volume snap images in all replicas, the result were all the same. Because they were all the same, I did not perform replica repair, thank you
At least for EXT4, running
fsck <mount point>would report error code 16 - usage or syntax error. I donāt think we can fix this issue by executingfsckbeforeNodeExpandVolumeIn the PR, it mentioned that the reproducing step is
online resizing is performed twice consecutively. But during the node reboot, we donāt know (and itās hard to know) if the node really executes fs resizing before completely shutting down.Based on my test, this checksum mismatching issue happens during the NodeStageVolume resizing. After this execution, running
dumpe2fs <volume device path>would encounter the error, for example:Actually, NodeStageVolume will execute
fsckbefore mounting. But the resizing requires a mount point, and itself is the cause of the error. Besides, runningfsckrequires volume being unmounted, we cannot executefsckafter the NodeStageVolume resizing or NodeExpandVolume resizing⦠In other word, I havenāt found a way to fix it.BTW, the correct workaround is: Scaling down then re-scaling up the workload.
Some notes
For a workaround, we can try if the error can be fixed by
fsck.cc @chriscchien @shuo-wu @innobead