longhorn: [BUG] volume does not get reattached cleanly after an interuption
Describe the bug
after an interruption (possibly related to network issues), a volume stays read-only, resulting in a crashloop of the pod. in this case, the pod has a node affinity, and thus is always scheduled to the same node. after deleting the pod (which is part of a statefulset), it gets recreated and starts just fine. the kernel log shows the volume attaching, and it starts up writable, and the workload is happy.
this here: https://github.com/longhorn/longhorn/issues/3283
looks like it is in the same ballpark, but not quite exactly the same issue.
To Reproduce
we don’t know for sure where the initial interruption is coming from, so we’re not able to reproduce. it happened now several days in a row, always at the same time, so if there’s anything we can measure / log, we’d be glad to try.
Expected behavior
the volume becomes usable (i.e. writable) again after an interruption, workloads get started cleanly.
Log or Support bundle
rabbitmq operator:
2021-11-25 02:01:45.612 [error] <0.1612.0> ** Generic server <0.1612.0> terminating
2021-11-25 02:01:45.615 [error] <0.4698.0> CRASH REPORT Process <0.4698.0> with 1 neighbours exited with reason: no match of right hand value {error,eio} in file_handle_cache:soft_close/1 line 885 in gen_server2:terminate/3 line 1172
2021-11-25 02:01:55.678 [error] <0.634.0> CRASH REPORT Process channel_queue_exchange_metrics_metrics_collector with 0 neighbours crashed with reason: bad argument in call to ets:safe_fixtable(channel_queue_exchange_metrics, true) in ets:foldl/3 line 625
2021-11-25 02:01:55.717 [info] <0.11163.0> Mirrored queue 'celery' in vhost '[redacted]': Secondary replica of queue <rabbit@rabbitmqcluster-operator-server-2.rabbitmqcluster-operator-nodes.[redacted].1637752668.11163.0> detected replica <rabbit@rabbitmqcluster-operator-server-1.rabbitmqcluster-operator-nodes.[redacted].1637752774.15340.7> to be down
then, rabbitmq operator is stuck in a crashloop:
02:02:14.914 [warning] Failed to write PID file "/var/lib/rabbitmq/mnesia/rabbit@rabbitmqcluster-operator-server-1.rabbitmqcluster-operator-nodes.[redacted].pid": read-only file system
node kernel:
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#25 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#25 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#25 CDB: Write(10) 2a 00 01 44 7f 38 00 00 48 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 21266232 op 0x1:(WRITE) flags 0x800 phys_seg 9 prio class 0
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#24 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#24 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#24 CDB: Write(10) 2a 00 02 06 54 40 00 00 08 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 33969216 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Nov 25 02:00:35 2021] EXT4-fs warning (device sdn): ext4_end_bio:309: I/O error 7 writing to inode 1048649 (offset 0 size 4096 starting block 4246153)
[Thu Nov 25 02:00:35 2021] Buffer I/O error on device sdn, logical block 4246152
[Thu Nov 25 02:00:35 2021] Aborting journal on device sdn-8.
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#21 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#21 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#21 CDB: Write(10) 2a 00 02 06 1a 08 00 00 08 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 33954312 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Nov 25 02:00:35 2021] EXT4-fs warning (device sdn): ext4_end_bio:309: I/O error 7 writing to inode 1048695 (offset 0 size 0 starting block 4244290)
[Thu Nov 25 02:00:35 2021] Buffer I/O error on device sdn, logical block 4244289
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#23 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#23 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#23 CDB: Write(10) 2a 00 00 0a 65 28 00 00 08 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 681256 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Nov 25 02:00:35 2021] EXT4-fs warning (device sdn): ext4_end_bio:309: I/O error 7 writing to inode 1048636 (offset 0 size 0 starting block 85158)
[Thu Nov 25 02:00:35 2021] Buffer I/O error on device sdn, logical block 85157
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#22 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#22 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#22 CDB: Write(10) 2a 00 00 0a 54 70 00 00 08 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 676976 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Nov 25 02:00:35 2021] EXT4-fs warning (device sdn): ext4_end_bio:309: I/O error 7 writing to inode 1048666 (offset 0 size 0 starting block 84623)
[Thu Nov 25 02:00:35 2021] Buffer I/O error on device sdn, logical block 84622
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#20 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#20 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#20 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#20 CDB: Write(10) 2a 00 00 0a 2c d8 00 00 08 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 666840 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Nov 25 02:00:35 2021] EXT4-fs warning (device sdn): ext4_end_bio:309: I/O error 7 writing to inode 1048681 (offset 0 size 0 starting block 83356)
[Thu Nov 25 02:00:35 2021] Buffer I/O error on device sdn, logical block 83355
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#26 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#26 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#26 CDB: Write(10) 2a 00 00 0a 5d 30 00 00 08 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 679216 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Nov 25 02:00:35 2021] EXT4-fs warning (device sdn): ext4_end_bio:309: I/O error 7 writing to inode 1048632 (offset 0 size 0 starting block 84903)
[Thu Nov 25 02:00:35 2021] Buffer I/O error on device sdn, logical block 84902
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#27 Sense Key : Medium Error [current]
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#27 Add. Sense: Unrecovered read error
[Thu Nov 25 02:00:35 2021] sd 15:0:0:1: [sdn] tag#27 CDB: Write(10) 2a 00 02 06 37 00 00 00 08 00
[Thu Nov 25 02:00:35 2021] blk_update_request: critical medium error, dev sdn, sector 33961728 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Nov 25 02:00:35 2021] EXT4-fs warning (device sdn): ext4_end_bio:309: I/O error 7 writing to inode 1048652 (offset 0 size 4096 starting block 4245217)
[Thu Nov 25 02:00:35 2021] Buffer I/O error on device sdn, logical block 4245216
[Thu Nov 25 02:00:47 2021] EXT4-fs error (device sdn): ext4_journal_check_start:61: Detected aborted journal
[Thu Nov 25 02:00:47 2021] EXT4-fs (sdn): Remounting filesystem read-only
[Thu Nov 25 02:00:56 2021] EXT4-fs error (device sdn): ext4_journal_check_start:61: Detected aborted journal
[Thu Nov 25 02:00:56 2021] EXT4-fs error (device sdn): ext4_journal_check_start:61: Detected aborted journal
[Thu Nov 25 02:00:56 2021] EXT4-fs error (device sdn): ext4_journal_check_start:61: Detected aborted journal
[Thu Nov 25 02:00:56 2021] EXT4-fs error (device sdn): ext4_journal_check_start:61: Detected aborted journal
node kernel after deleting the pod
[Thu Nov 25 09:13:05 2021] sd 15:0:0:1: [sdn] Synchronizing SCSI cache
[Thu Nov 25 09:13:12 2021] scsi host15: iSCSI Initiator over TCP/IP
[Thu Nov 25 09:13:12 2021] scsi 15:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
[Thu Nov 25 09:13:12 2021] scsi 15:0:0:0: Attached scsi generic sg25 type 12
[Thu Nov 25 09:13:12 2021] scsi 15:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
[Thu Nov 25 09:13:12 2021] sd 15:0:0:1: Power-on or device reset occurred
[Thu Nov 25 09:13:12 2021] sd 15:0:0:1: Attached scsi generic sg26 type 0
[Thu Nov 25 09:13:12 2021] sd 15:0:0:1: [sdn] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[Thu Nov 25 09:13:12 2021] sd 15:0:0:1: [sdn] Write Protect is off
[Thu Nov 25 09:13:12 2021] sd 15:0:0:1: [sdn] Mode Sense: 69 00 10 08
[Thu Nov 25 09:13:12 2021] sd 15:0:0:1: [sdn] Write cache: enabled, read cache: enabled, supports DPO and FUA
[Thu Nov 25 09:13:12 2021] sd 15:0:0:1: [sdn] Attached SCSI disk
[Thu Nov 25 09:13:24 2021] EXT4-fs (sdn): mounted filesystem with ordered data mode. Opts: (null)
longhorn-manager
time="2021-11-25T02:00:59Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb-e-aa17071b\", UID:\"72002d87-82de-4d3a-aa53-fb4041760b73\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"98211882\", FieldPath:\"\"}): type: 'Warning' reason: 'Faulted' Detected replica pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb-r-cba0f30b (10.42.7.124:10270) in error"
time="2021-11-25T02:00:59Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb\", UID:\"98228c58-1b7a-4969-9442-55b03f9bd440\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"98211881\", FieldPath:\"\"}): type: 'Normal' reason: 'Degraded' volume pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb became degraded"
time="2021-11-25T02:00:59Z" level=warning msg="pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb-r-cba0f30b: time=\"2021-11-24T11:17:12Z\" level=warning msg=\"Starting to execute registered shutdown func github.com/longhorn/longhorn-engine/app/cmd.startReplica.func4\""
longhorn-instance-manager
[longhorn-instance-manager] time="2021-11-25T02:00:59Z" level=debug msg="Process Manager: start getting logs for process pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb-r-cba0f30b"
[longhorn-instance-manager] time="2021-11-25T02:00:59Z" level=debug msg="Process Manager: got logs for process pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb-r-cba0f30b"
Environment
5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux k8s: 1.20.6 iscsi: open-iscsi 2.0.874-7.1ubuntu6.2 longhorn: longhornio/longhorn-instance-manager:v1_20210731 longhornio/longhorn-engine:v1.2.2 k8s.gcr.io/sig-storage/csi-attacher:v3.2.1 k8s.gcr.io/sig-storage/csi-provisioner:v2.1.2
- Installation method: Helm
- Kubernetes distro: RKE:
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 3
- Node config
- OS type and version: Ubuntu 20.04.3 LTS
- CPU per node: 8
- Memory per node: 32GB
- Disk type(e.g. SSD/NVMe): unknown (virtualized). the underlying disk appears as “sdb”
- Network bandwidth between the nodes: 1Gbit
- Underlying Infrastructure (Hetzner cloud):
- Number of Longhorn volumes in the cluster: ~20
Additional context
- networking between the nodes is using wireguard tunnels - we’re planning to switch this to encryption built into the CNI plugin, but as of now, the encryption is done outside of k8s. also, the MTUs are odd - the internal ethernet interfaces have an MTU of 1400, and on top of that we’re running wireguard tunnels.
- this happened every night at 2:00am for several days, and although we dont believe in a coincidence, we also have not identified any possible culprit. longhorn backups etc are all running at different times. this is, however, off-hours, cpu- and network at the point where this happens are relatively low.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 21 (11 by maintainers)
@PhanLe1010 liveness probe does not help because pod restart does not trigger volume unmount + remount on host. Only pod delete + create does.
I just found Bitnami Redis cluster (those have liveness probe configured by default) from boot loop where pods was constantly restarting with this output:
and volumes in node was in read-only state:
so I think that Longhorn should be modified to use mount parameter
errors=continueat way pods would works at least after liveness probe restarts them. Or perhaps it should be even configurable option because in some environments users might want use evenerrors=panicas it forces nodes to reboot after significant network failure.Ref https://linux.die.net/man/8/mount
For the
remount-romode, if there are many io errors, the filesystem will be remounted to read-only mode, and Longhorn is not aware of this. We can check this option bytune2fs -l /dev/longhorn/<name>if users encounter this issue.cc @innobead @shuo-wu @joshimoo @PhanLe1010
I’ve checked the log. Both pvc-f052a410-f19d-4fca-a0ce-936d8f23b6cb and pvc-cbbdf292-cdca-47a8-a4e4 only had one disconnected replica, and the other two replicas were still running. The volumes was not crashed. No sure why ext4 filesystem showed the IO error.
Also,
Automatically Delete Workload Pod when The Volume Is Detached Unexpectedlywas not triggered, so the automatic recovery was not executed that time.cc. @shuo-wu @PhanLe1010 @jenting, do you have any idea?
longhorn-support-bundle_2021-11-30.zip
lets try this again … so here is the support bundle.
@derekbit hmm. its a bit difficult - while it is kind of reproducible, this is a production environment. and at the moment, we increased the replica count to 3, in the hopes that we can avoid this issue entirely if the volume is available locally on the host that the pod is running on. i will see if we can make this happen, but i’m not too happy about intentionally breaking a production system (althought the impact is not that severe, there’s always risks).
as for the “delete workload…” option, i am relatively sure that this did not happen, or if it did, then something else went wrong. yesterday morning, i did see this behaviour again, and the pod still had the read-only filesystem. after manually deleting the pod, it came back just fine.