zrepl: [Linux >= 5.8 Bug]: `zfs recv` blocked on `pipe_read`, zrepl daemon blocked on `pipe_write`

Config client:

jobs:
- name: vie1_to_grz1
  type: push
  connect:
    type: tls
    address: "grz1:777"
    ca: /etc/zrepl/ca.crt
    cert: /etc/zrepl/vie1.crt
    key:  /etc/zrepl/vie1.key
    server_cn: "grz1"
  filesystems: {
    "zroot<": true,
    "zroot/data/docker": true,
    "zroot/data/docker<": false
  }
  snapshotting:
    type: periodic
    prefix: zrepl_
    interval: 1h
  pruning:
    keep_sender:
    - type: not_replicated
    - type: last_n
      count: 3
    keep_receiver:
    - type: grid
      grid: 24x1h | 30x1d
      regex: "^zrepl_"

Config server:

jobs:
- name: vie1_sink
  type: sink
  serve:
      type: tls
      listen: ":777"
      ca: "/etc/zrepl/ca.crt"
      cert: "/etc/zrepl/grz1.crt"
      key: "/etc/zrepl/grz1.key"
      client_cns:
        - "vie1"
  root_fs: "zmedia/enc/backup/zrepl"

Snapshots client:

$ zfs list -t snapshot zroot
NAME                              USED  AVAIL     REFER  MOUNTPOINT
zroot@zrepl_20210220_015709_000     0B      -      192K  -
zroot@zrepl_20210220_025709_000     0B      -      192K  -
zroot@zrepl_20210220_035709_000     0B      -      192K  -
zroot@zrepl_20210220_045709_000     0B      -      192K  -
zroot@zrepl_20210220_055709_000     0B      -      192K  -
zroot@zrepl_20210220_065709_000     0B      -      192K  -
zroot@zrepl_20210220_075709_000     0B      -      192K  -
zroot@zrepl_20210220_085709_000     0B      -      192K  -
zroot@zrepl_20210220_095709_000     0B      -      192K  -
zroot@zrepl_20210220_105709_000     0B      -      192K  -
zroot@zrepl_20210220_115709_000     0B      -      192K  -
zroot@zrepl_20210220_125709_000     0B      -      192K  -
zroot@zrepl_20210220_135710_000     0B      -      192K  -
zroot@zrepl_20210220_145709_000     0B      -      192K  -
zroot@zrepl_20210220_155709_000     0B      -      192K  -
zroot@zrepl_20210220_165709_000     0B      -      192K  -
zroot@zrepl_20210220_175709_000     0B      -      192K  -
zroot@zrepl_20210220_185709_000     0B      -      192K  -
zroot@zrepl_20210220_195709_000     0B      -      192K  -
zroot@zrepl_20210220_205709_000     0B      -      192K  -
zroot@zrepl_20210220_215709_000     0B      -      192K  -
zroot@zrepl_20210220_225709_000     0B      -      192K  -

Snapshots server:

$ zfs list zmedia/enc/backup/zrepl/vie1/zroot -t snapshot
NAME                                                           USED  AVAIL     REFER  MOUNTPOINT
zmedia/enc/backup/zrepl/vie1/zroot@zrepl_20210219_235709_000     0B      -      128K  -
zmedia/enc/backup/zrepl/vie1/zroot@zrepl_20210220_005709_000     0B      -      128K  -
zmedia/enc/backup/zrepl/vie1/zroot@zrepl_20210220_015709_000     0B      -      128K  -
zmedia/enc/backup/zrepl/vie1/zroot@zrepl_20210220_025709_000     0B      -      128K  -
zmedia/enc/backup/zrepl/vie1/zroot@zrepl_20210220_035709_000     0B      -      128K  -
zmedia/enc/backup/zrepl/vie1/zroot@zrepl_20210220_045709_000     0B      -      128K  -

Log client:

Feb 20 23:57:09 vie1 zrepl[99401]: 2021-02-20T23:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 22:57:09 vie1 zrepl[99401]: 2021-02-20T22:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 21:57:09 vie1 zrepl[99401]: 2021-02-20T21:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 20:57:09 vie1 zrepl[99401]: 2021-02-20T20:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 19:57:09 vie1 zrepl[99401]: 2021-02-20T19:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 18:57:09 vie1 zrepl[99401]: 2021-02-20T18:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 17:57:09 vie1 zrepl[99401]: 2021-02-20T17:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 16:57:09 vie1 zrepl[99401]: 2021-02-20T16:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 15:57:09 vie1 zrepl[99401]: 2021-02-20T15:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 14:57:10 vie1 zrepl[99401]: 2021-02-20T14:57:10+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 13:57:09 vie1 zrepl[99401]: 2021-02-20T13:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 12:57:09 vie1 zrepl[99401]: 2021-02-20T12:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 11:57:09 vie1 zrepl[99401]: 2021-02-20T11:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 10:57:09 vie1 zrepl[99401]: 2021-02-20T10:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 09:57:09 vie1 zrepl[99401]: 2021-02-20T09:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 08:57:09 vie1 zrepl[99401]: 2021-02-20T08:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 07:57:09 vie1 zrepl[99401]: 2021-02-20T07:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 06:57:09 vie1 zrepl[99401]: 2021-02-20T06:57:09+01:00 [WARN][vie1_to_grz1][snapshot][HHbD$liGP$wanS$wanS.OUMO]: callback channel is full, discarding snapshot update event
Feb 20 00:57:08 vie1 systemd[1]: Started zrepl daemon.
Feb 20 00:57:08 vie1 systemd[1]: Starting zrepl daemon...

Log server:

Feb 20 13:29:50 grz1 zrepl[1269988]: 2021-02-20T13:29:50+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="read tcp 5.206.204.32:777->167.248.133.38:55754: read: connection reset by peer" errType="*net.OpError"
Feb 20 13:29:49 grz1 zrepl[1269988]: 2021-02-20T13:29:49+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="tls: first record does not look like a TLS handshake" errType="tls.RecordHeaderError"
Feb 20 13:29:48 grz1 zrepl[1269988]: 2021-02-20T13:29:48+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="tls: client didn't provide a certificate" errType="*errors.errorString"
Feb 20 13:29:48 grz1 zrepl[1269988]: 2021-02-20T13:29:48+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="read tcp 5.206.204.32:777->167.248.133.38:52396: read: connection reset by peer" errType="*net.OpError"
Feb 20 13:29:42 grz1 zrepl[1269988]: 2021-02-20T13:29:42+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error errType="*net.OpError" err="read tcp 5.206.204.32:777->74.120.14.56:52332: read: connection reset by peer"
Feb 20 13:29:42 grz1 zrepl[1269988]: 2021-02-20T13:29:42+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error errType="tls.RecordHeaderError" err="tls: first record does not look like a TLS handshake"
Feb 20 13:29:41 grz1 zrepl[1269988]: 2021-02-20T13:29:41+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="tls: client didn't provide a certificate" errType="*errors.errorString"
Feb 20 13:29:41 grz1 zrepl[1269988]: 2021-02-20T13:29:41+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="read tcp 5.206.204.32:777->74.120.14.56:51506: read: connection reset by peer" errType="*net.OpError"
Feb 20 13:29:37 grz1 zrepl[1269988]: 2021-02-20T13:29:37+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="read tcp 5.206.204.32:777->162.142.125.39:44678: read: connection reset by peer" errType="*net.OpError"
Feb 20 13:29:37 grz1 zrepl[1269988]: 2021-02-20T13:29:37+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="tls: first record does not look like a TLS handshake" errType="tls.RecordHeaderError"
Feb 20 13:29:35 grz1 zrepl[1269988]: 2021-02-20T13:29:35+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="tls: client didn't provide a certificate" errType="*errors.errorString"
Feb 20 13:29:35 grz1 zrepl[1269988]: 2021-02-20T13:29:35+01:00 [ERRO][vie1_sink][transportmux][yrPp$4UEx$4UEx.gDJt]: accept error err="read tcp 5.206.204.32:777->162.142.125.39:42520: read: connection reset by peer" errType="*net.OpError"
Feb 20 00:55:53 grz1 systemd[1]: Started zrepl daemon.
Feb 20 00:55:53 grz1 systemd[1]: Starting zrepl daemon...

zrepl status client:

Job: vie1_to_grz1
    Type: push
    Replication:
        Attempt #1
        Status: fan-out-filesystems
        Progress: [==================================================>] 102.3 MiB / 83.4 MiB @ 0 B/s
          zroot               DONE (step 1/1, 624 B/624 B)
          zroot/ROOT          DONE (step 1/1, 624 B/624 B)
          zroot/ROOT/default  DONE (step 1/1, 21.5 MiB/22.0 MiB)
          zroot/data          DONE (step 1/1, 624 B/624 B)
          zroot/data/docker   DONE (step 1/1, 80.3 MiB/60.9 MiB)
          zroot/data/home     STEPPING (step 0/1, 216.7 KiB/349.5 KiB) next: @zrepl_20210220_035709_000 => @zrepl_20210220_045709_000 (encrypted=no)
          zroot/data/mysql    DONE (step 1/1, 624 B/624 B)
          zroot/data/postgres DONE (step 1/1, 157.2 KiB/208.6 KiB)
    Pruning Sender:
        ...
    Pruning Receiver:
        ...
    Snapshotting:
        Status: Waiting
        Sleep until: 2021-02-21 00:57:09.460089176 +0100 CET
            zroot               SnapDone 0s snap name: "zrepl_20210220_225709_000"
            zroot/ROOT          SnapDone 0s snap name: "zrepl_20210220_225709_000"
            zroot/ROOT/default  SnapDone 0s snap name: "zrepl_20210220_225709_000"
            zroot/data          SnapDone 0s snap name: "zrepl_20210220_225709_000"
            zroot/data/docker   SnapDone 0s snap name: "zrepl_20210220_225709_000"
            zroot/data/home     SnapDone 0s snap name: "zrepl_20210220_225709_000"
            zroot/data/mysql    SnapDone 0s snap name: "zrepl_20210220_225709_000"
            zroot/data/postgres SnapDone 0s snap name: "zrepl_20210220_225709_000"

zrepl status server:

Job: vie1_sink
    Type: sink
    No status representation for job type 'sink', dumping as YAML
    snapper: null

Trying to stop zrepl on the client with systemctl stop zrepl hangs until systemd kills it forcefully. Log client after systemctl restart zrepl:

Feb 21 00:08:19 vie1 systemd[1]: Started zrepl daemon.
Feb 21 00:08:18 vie1 systemd[1]: Starting zrepl daemon...
Feb 21 00:08:18 vie1 systemd[1]: Stopped zrepl daemon.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Failed with result 'timeout'.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Main process exited, code=killed, status=9/KILL
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99541 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99535 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99409 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99408 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99407 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99406 (n/a) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99405 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99404 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: Killing process 99401 (zrepl) with signal SIGKILL.
Feb 21 00:08:18 vie1 systemd[1]: zrepl.service: State 'stop-sigterm' timed out. Killing.
Feb 21 00:06:48 vie1 systemd[1]: Stopping zrepl daemon...

Doesn’t seem to do anything after restarting it:

Job: vie1_to_grz1
    Type: push
    Replication:
        ...
    Pruning Sender:
        ...
    Pruning Receiver:
        ...
    Snapshotting:
        Status: SyncUp
        Sleep until: 2021-02-21 00:57:09 +0100 CET

After running zrepl signal wakeup vie1_to_grz1 and waiting a while it gets stuck: zrepl status client:

Job: vie1_to_grz1
    Type: push
    Replication:
        Attempt #1
        Status: fan-out-filesystems
        Progress: [==============================>--------------------] 1.0 GiB / 1.7 GiB @ 0 B/s
          zroot               STEPPING (step 10/18, 6.1 KiB/11.0 KiB) next: @zrepl_20210220_145709_000 => @zrepl_20210220_155709_000 (encrypted=no)
          zroot/ROOT          STEPPING (step 10/18, 6.1 KiB/11.0 KiB) next: @zrepl_20210220_145709_000 => @zrepl_20210220_155709_000 (encrypted=no)
          zroot/ROOT/default  STEPPING (step 10/18, 321.4 MiB/545.2 MiB) next: @zrepl_20210220_145709_000 => @zrepl_20210220_155709_000 (encrypted=no)
          zroot/data          STEPPING (step 9/18, 5.5 KiB/11.0 KiB) next: @zrepl_20210220_135709_000 => @zrepl_20210220_145709_000 (encrypted=no)
          zroot/data/docker   STEPPING (step 9/18, 709.1 MiB/1.1 GiB) next: @zrepl_20210220_135709_000 => @zrepl_20210220_145709_000 (encrypted=no)
          zroot/data/home     STEPPING (step 10/19, 22.4 MiB/30.2 MiB) next: @zrepl_20210220_135709_000 => @zrepl_20210220_145709_000 (encrypted=no)
          zroot/data/mysql    STEPPING (step 10/18, 328.9 KiB/1.8 MiB) next: @zrepl_20210220_145709_000 => @zrepl_20210220_155709_000 (encrypted=no)
          zroot/data/postgres STEPPING (step 10/18, 1.8 MiB/4.5 MiB) next: @zrepl_20210220_145709_000 => @zrepl_20210220_155709_000 (encrypted=no)
    Pruning Sender:
        ...
    Pruning Receiver:
        ...
    Snapshotting:
        Status: SyncUp
        Sleep until: 2021-02-21 00:57:09 +0100 CET

Client log and server log empty. No zfs command running on the client. On the server this one was found in ps ax: zfs recv -s zmedia/enc/backup/zrepl/vie1/zroot/data/home@zrepl_20210220_145709_000

ZFS bug? Killing the zfs recv on the server shows STEP-ERROR on the client which continues for a while until it gets stuck again on another snapshot…

After killing zfs on the server another few times and running zrepl signal wakeup vie1_to_grz1 everything seems fine again for the time being. However I’m sure the issue will come back sooner than later. Already had this happen multiple times and fill up the client with snapshots. Removed zrepl with all snapshots and holds from both client and server back then until I had time to look at it again.

All of this is on arch linux with both servers running the latest OpenZFS: zfs-2.0.3-1 zfs-kmod-2.0.3-1

$ zrepl version
client: zrepl version=arch-0.3.1 go=go1.15.6 GOOS=linux GOARCH=amd64 Compiler=gc
server: zrepl version=arch-0.3.1 go=go1.15.6 GOOS=linux GOARCH=amd64 Compiler=gc

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 50 (44 by maintainers)

Commits related to this issue

zfs: pipe size: default to value of /proc/sys/fs/pipe-max-siz Addition by @problame: move getPipeCapacityHint() into platform-specific code. This has the added benefit of not recognizing the envvar a... — committed to zrepl/zrepl by lukas2511 3 years ago

Most upvoted comments

@BotoX @mafredri could you give my pull-request (#449) a try to make sure it really works? tested it with 800 snapshots without issues but since the issue always was intermittent for me it would be great to have somebody else testing it

lukas2511 on Mar 22, 2021

Git bisect landed on:

c73be61cede5882f9605a852414db559c0ebedfd is the first bad commit                      
commit c73be61cede5882f9605a852414db559c0ebedfd                           
Author: David Howells <dhowells@redhat.com>                       
Date:   Tue Jan 14 17:07:11 2020 +0000                                                
                                                                                      
    pipe: Add general notification queue support

Which btw also changes stuff around resizing pipes and trying to trace the kernel showed that zfs_receive is resizing the pipe itself, so maybe that’s triggering the issue?

lukas2511 on Mar 14, 2021

Just a quick heads-up: Last night I did some more work on this and I found the race condition in the kernel. I submitted a patch with the kernel bugzilla issue.

lukas2511 on Mar 16, 2021

Looks like closing stdin after the command has started is perfectly fine and is exactly what the stdlib does too, lack of understanding the internals on my part. (Fine, since the file descriptor has already been assigned to the process and Go is no longer responsible for consuming it.)

So both of the above approaches were a bust. However, gave me a clue for an ugly workaround:

diff --git a/zfs/zfs.go b/zfs/zfs.go
index c26bcd1..eaa0243 100644
--- a/zfs/zfs.go
+++ b/zfs/zfs.go
@@ -1179,6 +1179,8 @@ func ZFSRecv(ctx context.Context, fs string, v *ZFSSendArgVersion, stream io.Rea
 	stdin.Close()
 	defer stdinWriter.Close()
 
+	time.Sleep(time.Second)
+
 	pid := cmd.Process().Pid
 	debug := func(format string, args ...interface{}) {
 		debug("recv: pid=%v: %s", pid, fmt.Sprintf(format, args...))

This little change has allowed replication to work without issue for a couple of days now. I’m not sure what’s going on but it seems like either zfs recv has trouble consuming stdin on startup or the pipe needs time to settle, for whatever reason. I’ll try to do some more testing around this and perhaps we’ll be able to create a test that reproduces the issue.

I just wanted to give an update and perhaps some others could confirm wether or not the workaround works? It’ll be a few days before I can look at this further.

mafredri on Mar 9, 2021

It ends just before the “start waiting” line.

Nit: The last line in https://gist.github.com/lukas2511/c3aa7c8b61b0d1af7e03cf3feb0a4eb9#file-receiver-log is start waiting.

On the sending side it’s missing, but I guess that’s just normal at that point in the transfer. I think I was comparing the wrong log snippet since there are two blocks with starting command for each zfs send. Sorry for that.

Edit: yea the first time it’s doing the dry run with CombinedOutput and the second command is the actual send using simply Start.

But yeah, even without stack traces (@lukas2511 are you familiar with Go’s debugger, dlv ?) this looks like the situation we observed on @BotoX 's setup.

Unfortunately no. For now I added lots of debug log statements to the zfscmd.go, but on trying it out I got lucky again and it send the remaining ~300 snapshots in one go… As soon as it stalls again I’ll try attaching a debugger in some way.

I poked around the procfs and found that zrepl had an open pipe with unread data; reading it contents revealed some changed data from my home directory (spotify caches… i should probably move those to a different dataset…), so I’m guessing that it’s aktually the zfs send content and maybe zrepl somehow “forgets” to clean up the process therefore stalling and making the zfs thread a zombie until it’s destroyed by its parent process?

Just to clarify: on which side was that open pipe? Sender or Receiver?

This was on the sending side, the system with the defunct zfs process.

lukas2511 on Mar 1, 2021

Of course now that I finally got the ~1000 missed snapshots synced to the server it completes totally fine on every run… gotta wait for it to fail and queue up a few days of snapshots again.

lukas2511 on Feb 28, 2021