propolis: Instance stuck waiting for NVMe pause future after controller reset
If Crucible takes too long to process some IO, Linux resets the NVMe controller, and eventually disables the controller and kicks out the disk:
[ 265.311275] nvme nvme1: I/O 775 QID 2 timeout, aborting
[ 265.311360] nvme nvme1: Abort status: 0x6
[ 265.311382] nvme nvme1: I/O 779 QID 2 timeout, aborting
[ 265.311430] nvme nvme1: Abort status: 0x6
[ 265.311432] nvme nvme1: I/O 781 QID 2 timeout, aborting
[ 265.311478] nvme nvme1: Abort status: 0x6
[ 265.311479] nvme nvme1: I/O 783 QID 2 timeout, aborting
[ 265.311525] nvme nvme1: Abort status: 0x6
[ 265.311527] nvme nvme1: I/O 785 QID 2 timeout, aborting
[ 265.311572] nvme nvme1: Abort status: 0x6
[ 265.311574] nvme nvme1: I/O 787 QID 2 timeout, aborting
[ 265.311619] nvme nvme1: Abort status: 0x6
[ 265.311621] nvme nvme1: I/O 789 QID 2 timeout, aborting
[ 265.311666] nvme nvme1: Abort status: 0x6
[ 265.311667] nvme nvme1: I/O 791 QID 2 timeout, aborting
[ 265.311713] nvme nvme1: Abort status: 0x6
[ 265.311715] nvme nvme1: I/O 797 QID 2 timeout, aborting
[ 265.311759] nvme nvme1: Abort status: 0x6
[ 265.311760] nvme nvme1: I/O 800 QID 2 timeout, aborting
[ 265.311796] nvme nvme1: Abort status: 0x6
[ 265.311797] nvme nvme1: I/O 810 QID 2 timeout, aborting
[ 265.311837] nvme nvme1: Abort status: 0x6
[ 296.031777] nvme nvme1: I/O 775 QID 2 timeout, reset controller
[ 296.093166] nvme nvme1: 4/0/0 default/read/poll queues
[ 326.752213] nvme nvme1: I/O 775 QID 2 timeout, disable controller
[ 326.784408] blk_update_request: I/O error, dev nvme1n1, sector 43285296 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.786606] blk_update_request: I/O error, dev nvme1n1, sector 58067792 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.788863] blk_update_request: I/O error, dev nvme1n1, sector 52257224 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.791061] blk_update_request: I/O error, dev nvme1n1, sector 50529104 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.792652] blk_update_request: I/O error, dev nvme1n1, sector 4079104 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.793756] blk_update_request: I/O error, dev nvme1n1, sector 4948896 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.794824] blk_update_request: I/O error, dev nvme1n1, sector 62753112 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.795934] blk_update_request: I/O error, dev nvme1n1, sector 30395728 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
blk_update_request: I/O error, dev nvme1n1, sector 53025896 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[[ 326.798205] blk_update_request: I/O error, dev nvme1n1, sector 58618160 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
32m[ 326.784408] blk_update_request: I/O error, dev nvme1n1, sector 43285296 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.786606] blk_update_request: I/O error, dev nvme1n1, sector 58067792 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.788863] blk_update_request: I/O error, dev nvme1n1, sector 52257224 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.791061] blk_update_request: I/O error, dev nvme1n1, sector 50529104 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.792652] blk_update_request: I/O error, dev nvme1n1, sector 4079104 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.793756] blk_update_request: I/O error, dev nvme1n1, sector 4948896 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.794824] blk_update_request: I/O error, dev nvme1n1, sector 62753112 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.795934] blk_update_request: I/O error, dev nvme1n1, sector 30395728 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.797081] blk_update_request: I/O error, dev nvme1n1, sector 53025896 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.798205] blk_update_request: I/O error, dev nvme1n1, sector 58618160 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 326.826228] nvme nvme1: failed to mark controller live state
[ 326.826230] nvme nvme1: Removing after probe failure status: -19
[ 326.852128] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[ 326.853569] Dev nvme1n1: unable to read RDB block 0
[ 326.852128] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[ 326.853565] ldm_validate_partition_table(): Disk read failed.
[ 326.853569] Dev nvme1n1: unable to read RDB block 0
[ 326.854535] nvme1n1: unable to read partition table
[ 326.854542] nvme1n1: partition table beyond EOD, truncated
I then tried to sudo poweroff from within the instance, and it’s stuck waiting for the pci-nvme-0.18.0 pause future to complete:
20:10:25.253Z INFO propolis-server (vm_state_worker): Halting due to chipset-driven halt
20:10:25.253Z INFO propolis-server (vm_state_worker): Stopping instance
20:10:25.253Z INFO propolis-server (vcpu_tasks): vCPU paused
vcpu = 0
20:10:25.253Z INFO propolis-server (vcpu_tasks): vCPU paused
vcpu = 1
20:10:25.253Z INFO propolis-server (vcpu_tasks): vCPU paused
vcpu = 2
20:10:25.253Z INFO propolis-server (vcpu_tasks): vCPU paused
vcpu = 3
20:10:25.253Z INFO propolis-server: request completed
latency_us = 355401112
local_addr = [fd00:1122:3344:104::22]:12400
method = GET
remote_addr = [fd00:1122:3344:104::1]:49586
req_id = 15bb07b4-12de-4d44-9a9a-9dd5f0eae16c
response_code = 200
uri = /instance/state-monitor
20:10:25.253Z INFO propolis-server (vm_controller): Sending pause request to lpc-bhyve-atpic
20:10:25.253Z INFO propolis-server (vm_controller): Sending pause request to lpc-bhyve-atpit
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-bhyve-hpet
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-bhyve-ioapic
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-bhyve-rtc
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to bhyve-vcpu-0
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to bhyve-vcpu-1
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to bhyve-vcpu-2
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to bhyve-vcpu-3
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to chipset-i440fx
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to pci-piix4-hb
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to pci-piix3-lpc
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to pci-piix3-pm
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-bhyve-pmtimer
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-uart-com4
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-uart-com1
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-uart-com3
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-uart-com2
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to lpc-ps2ctrl
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to qemu-lpc-debug
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to pci-virtio-viona-0.9.0
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to pci-nvme-0.18.0
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to block-crucible-7f4cbb29-0585-4fae-b727-c8c72a3229eb
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to pci-nvme-0.17.0
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to block-crucible-6f6dae39-6793-4e2c-abe8-2c72e88bdab0
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to pci-virtio-block-0.24.0
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to block-in-memory-cloud-init
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to qemu-fwcfg
20:10:25.254Z INFO propolis-server (vm_controller): Sending pause request to qemu-ramfb
20:10:25.254Z INFO propolis-server (vm_controller): Waiting for entities to pause
20:10:25.254Z INFO propolis-server (vm_controller): Got paused future from entity lpc-bhyve-atpic
20:10:25.254Z INFO propolis-server (vm_controller): Got paused future from entity lpc-bhyve-atpit
20:10:25.254Z INFO propolis-server (vm_controller): Got paused future from entity lpc-bhyve-hpet
20:10:25.254Z INFO propolis-server (vm_controller): Got paused future from entity lpc-bhyve-ioapic
20:10:25.254Z INFO propolis-server (vm_controller): Got paused future from entity lpc-bhyve-rtc
20:10:25.254Z INFO propolis-server (vm_controller): Got paused future from entity bhyve-vcpu-0
20:10:25.254Z INFO propolis-server (vm_controller): Got paused future from entity bhyve-vcpu-1
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity bhyve-vcpu-2
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity bhyve-vcpu-3
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity pci-piix4-hb
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity pci-piix3-lpc
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity lpc-bhyve-pmtimer
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity pci-piix3-pm
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity chipset-i440fx
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity lpc-uart-com4
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity lpc-uart-com1
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity lpc-uart-com3
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity lpc-uart-com2
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity lpc-ps2ctrl
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity qemu-lpc-debug
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity pci-virtio-viona-0.9.0
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity block-crucible-7f4cbb29-0585-4fae-b727-c8c72a3229eb
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity pci-nvme-0.18.0
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity block-crucible-6f6dae39-6793-4e2c-abe8-2c72e88bdab0
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity pci-nvme-0.17.0
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity block-in-memory-cloud-init
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity pci-virtio-block-0.24.0
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity qemu-fwcfg
20:10:25.255Z INFO propolis-server (vm_controller): Got paused future from entity qemu-ramfb
20:10:25.255Z INFO propolis-server (vm_controller): entity lpc-bhyve-atpic completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity lpc-bhyve-atpit completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity lpc-bhyve-hpet completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity lpc-bhyve-ioapic completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity lpc-bhyve-rtc completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity bhyve-vcpu-0 completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity bhyve-vcpu-1 completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity bhyve-vcpu-2 completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity bhyve-vcpu-3 completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity pci-piix4-hb completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity pci-piix3-lpc completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity lpc-bhyve-pmtimer completed pause
20:10:25.255Z INFO propolis-server (vm_controller): entity pci-piix3-pm completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity chipset-i440fx completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity lpc-uart-com4 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity lpc-uart-com1 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity lpc-uart-com3 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity lpc-uart-com2 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity lpc-ps2ctrl completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity qemu-lpc-debug completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity pci-virtio-viona-0.9.0 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity block-crucible-7f4cbb29-0585-4fae-b727-c8c72a3229eb completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity block-crucible-6f6dae39-6793-4e2c-abe8-2c72e88bdab0 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity pci-nvme-0.17.0 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity block-in-memory-cloud-init completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity pci-virtio-block-0.24.0 completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity qemu-fwcfg completed pause
20:10:25.256Z INFO propolis-server (vm_controller): entity qemu-ramfb completed pause
This instance is stuck in the Stopping state
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 15 (14 by maintainers)
Unfortunately it hasn’t reproduced yet.