zrepl: [FreeBSD] zrepl and zfs send sometimes get into a deadlock-like state

I periodically observe a problem where a zrepl instance configured with a single push job stops doing its work. In that state it logs messages like

zrepl[12895]: job=backup subsystem=snapshot span=Uv38$DEgt$F9Bq$F9Bq.z4Xn msg="callback channel is full, discarding snapshot update event"

I see that the zrepl process is waiting on a zfs send process.

   0 12895 12894  3  25  0  751192  19648 uwait    I     -      1:41.38 /usr/local/bin/zrepl --config /usr/local/etc/zrepl/zrepl.yml daemon
   0 13287 12895  1  20  0   17452   5736 tq_drain D     -      0:00.50 zfs send -t 1-f2979c6bf-d8-789c636064000310a500c4ec50360710e72765a5269730307c81aac1904f4b2b4e2d01c958b1c1e4d990e4932a4b528b81b484dd7e366cfa4bf2d34b33531818f4f3ea56ab54f87ff34092e704cbe725e6a632301415e4e7e7e8971617e967e4e7a63a5415a516e4e81a1918191a98191ac61b589a5a1a99c41b1818302001007cb71f86

The send process is stuck waiting for a ZFS helper thread to complete writing to a pipe.

PID    TID COMM                TDNAME              KSTACK                       
13287 100583 zfs                 -                   mi_switch+0x178 sleepq_switch+0xfc _sleep+0x27c taskqueue_drain+0x100 taskq_wait_id+0x2c spa_taskq_dispatch_sync+0xa8 dump_bytes+0x30 dump_record+0x14c dmu_dump_write+0x5d8 dmu_send_impl+0xd70 dmu_send+0x5b8 zfs_ioc_send_new+0x164 zfsdev_ioctl_common+0x534 zfsdev_ioctl+0x150 devfs_ioctl+0xfc vn_ioctl+0x108 devfs_ioctl_f+0x20 kern_ioctl+0x2ec 
13287 100807 zfs                 send_traverse_threa mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _cv_wait_sig+0x164 bqueue_enqueue_impl+0x78 send_cb+0x530 traverse_visitbp+0x1c8 traverse_visitbp+0x6ec traverse_dnode+0xb4 traverse_visitbp+0x8c0 traverse_visitbp+0x6ec traverse_visitbp+0x6ec traverse_visitbp+0x6ec traverse_visitbp+0x6ec traverse_visitbp+0x6ec traverse_dnode+0xb4 traverse_visitbp+0x520 
13287 100809 zfs                 send_merge_thread   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _cv_wait_sig+0x164 bqueue_enqueue_impl+0x78 send_merge_thread+0x12c fork_exit+0x74 fork_trampoline+0x14 
13287 100810 zfs                 send_reader_thread  mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _cv_wait_sig+0x164 bqueue_enqueue_impl+0x78 send_reader_thread+0x14c fork_exit+0x74 fork_trampoline+0x14
    0 100394 kernel              z_fr_iss_1          mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 pipe_write+0x8cc zfs_file_write_impl+0xac zfs_file_write+0x24 dump_bytes_cb+0x20 taskq_run+0x1c taskqueue_run_locked+0x188 taskqueue_thread_loop+0x9c fork_exit+0x74 fork_trampoline+0x14

The other end of the pipe is apparently in the zrepl process. But it seems that the zrepl process is not reading from the pipe. At least I cannot identify a thread that would be doing that.

12895 100608 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100756 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_timedwait_sig+0x14 _sleep+0x248 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100757 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100758 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100759 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_timedwait_sig+0x14 _sleep+0x248 kqueue_kevent+0x9c8 kern_kevent_fp+0xb0 kern_kevent+0x88 kern_kevent_generic+0x78 freebsd11_kevent+0x68 do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100760 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100761 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100762 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100763 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100764 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100768 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100769 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 kern_wait6+0x3d8 sys_wait6+0x68 do_el0_sync+0x4a4 handle_el0_sync+0x90 
12895 100777 zrepl               -                   mi_switch+0x178 sleepq_switch+0xfc sleepq_catch_signals+0x2d4 sleepq_wait_sig+0x8 _sleep+0x270 umtxq_sleep+0x1ac do_wait+0x228 __umtx_op_wait_uint_private+0x60 sys__umtx_op+0x7c do_el0_sync+0x4a4 handle_el0_sync+0x90

At the moment I am not where the problem might be: zrepl, go run-time or FreeBSD. My current understanding based on an ongoing kgdb session is that the pipe was full at some moment, then zrepl read some but not all data from it. After that neither zrepl returned to read more data nor the writer was notified that it can write more data.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 42 (23 by maintainers)

Commits related to this issue

Most upvoted comments

I will test the patch after the weekend. Thank you!

P.S. I would not call the FreeBSD behavior a bug, a peculiarity maybe. And personally to me closing the pipe before killing the process makes more sense as the former seems a more graceful way to communicate with the process. Without having looked at the patch yet, I’d say that closing the pipe and sending a signal (in either order) and only then waiting for the process to exit should be the most robust.

Could you try pinning the zrepl daemon process to a single logical CPU using cpuset? See https://github.com/zrepl/zrepl/issues/411#issuecomment-762160424

I just locked it down to a single core. It usually takes at least a few days to trigger, so I’ll report back later in the week unless it fails.