restic: restic fails on repo mounted via CIFS/samba on Linux using go 1.14 build
This issue is a summary of https://forum.restic.net/t/prune-fails-on-cifs-repo-using-go-1-14-build/2579 , intended as a reference for the underlying problem.
tl;dr Workaround: Setting the environment variable GODEBUG to asyncpreemptoff=1 restores the pre Go 1.14 behavior and fixes the problem.
Output of restic version
restic 0.9.6 (v0.9.6-137-gc542a509) compiled with go1.14 on linux/amd64
Linux Kernel version: 5.5.9
How did you run restic exactly?
restic prune -r /path/to/repository/on/CIFS/share
Relevant log excerpt:
Load(<data/216ea5f2d2>, 591, 4564169) returned error, retrying after 720.254544ms: open /mnt/nas/redacted/reponame/data/21/216ea5f2d21b458a7b913609ddef2a6ac4788b4bad5481b2916558d2ce1bef04: interrupted system call
Prune failed in the end
Further relevant log excerpts:
Load(<data/2e9db0642e>, 591, 4758136) returned error, retrying after 552.330144ms: read /mnt/nas/redacted/reponame/data/2e/2e9db0642e0fb67b959aa1d91c0d70daa8331ad246c5eeb8582ba2a14f24680f: interrupted system call
List(data) returned error, retrying after 282.818509ms: lstat /mnt/nas/redacted/reponame/data/64: interrupted system call
List(data) returned error, retrying after 492.389441ms: readdirent: interrupted system call
Save(<data/f0f5102554>) returned error, retrying after 552.330144ms: chmod /mnt/nas/redacted/reponame/data/f0/f0f51025542c0287943ef3816e642586be46ae10dc9efbcfa7b305d9e093dbd4: interrupted system call
What backend/server/service did you use to store the repository?
Local backend stored on a CIFS share
Expected behavior
No warnings, prune should complete.
Actual behavior
Prune failed.
Steps to reproduce the behavior
Build restic using Go 1.14 and store the backup repository on a CIFS share.
Do you have any idea what may have caused this?
This issue is a side effect of asynchronous preemptions in go 1.14. The [https://golang.org/doc/go1.14#runtime](release notes) state the following:
This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again.
Go configures signal handlers to restart syscalls if possible. The standard library also retries syscalls when necessary. That is there should only be issues when directly calling low-level syscalls and in that case one should just implement things properly. However, restic just uses go standard library functions that should already handle EINTR if necessary.
The first prune error message points to an os.Open call (via fs.Open) in the Load function of the local backend. So it looks like a Go standard library call fails. However, the manpage for signal (man 7 signal) states that the open syscall, that is called underneath, is always restarted when using SA_RESTART as is done by Go. So this seems to be a bug in the Linux kernel. Adding a loop around the call to fs.Open to repeat it as long as EINTR is returned, fixes that one call. Fixing all problematic calls would end up adding lots of ugly loops and playing whack-a-mole.
The manpages of lstat, readdir and chmod don’t even list EINTR as a possible errno.
Do you have an idea how to solve the issue?
Setting the environment variable GODEBUG to asyncpreemptoff=1 restores the pre Go 1.14 behavior and fixes the problem.
Go relies on the assumption that the kernel properly restarts syscalls when told to do so. As that latter is obviously not the case, the proper fix would be to submit a bug report to the linux kernel.
A short-term solution would be to add a note to the restic documentation that mentions the compatibility problem with CIFS mounts.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 2
- Comments: 39 (18 by maintainers)
Links to this issue
Commits related to this issue
- doc: Warn about compatibility issues with CIFS and restic On Linux CIFS (SMB) seems to be incompatible with the async preemption implementation of Go 1.14. CIFS seems not to restart syscalls (open, r... — committed to MichaelEischer/restic by MichaelEischer 4 years ago
- doc: Warn about compatibility issues with CIFS and restic On Linux CIFS (SMB) seems to be incompatible with the async preemption implementation of Go 1.14. CIFS seems not to restart syscalls (open, r... — committed to seqizz/restic by MichaelEischer 4 years ago
- Update pkg/xattr to handle EINTR on Linux Updates #2659. This is one of the cases where the stdlib will not handle EINTR for us, even with Go 1.16. That xattr calls are directly affected can be seen ... — committed to greatroar/restic by deleted user 4 years ago
- Update pkg/xattr to handle EINTR on Linux Updates #2659. This is a case where the stdlib will not handle EINTR for us, even with Go 1.16. That xattr calls are directly affected can be seen in the rep... — committed to greatroar/restic by deleted user 4 years ago
- Update pkg/xattr to handle EINTR on Linux Updates #2659. This is a case where the stdlib will not handle EINTR for us, even with Go 1.16. That xattr calls are directly affected can be seen in the rep... — committed to mfrischknecht/restic by deleted user 4 years ago
I created a small (~25GB) repo as a test case and went through a series of backup, forget, and prune operations similar to those where I saw errors originally. Everything worked perfectly.
I performed the testing using
restic 0.14.0 compiled with go1.19 on linux/amd64under linux 6.06. I did not set GODEBUG,
This wasn’t an exhaustive test by any means, but it was extensive enough that (based on my previous experience) I would have expected to see multiple errors if the issue was still present. So the issue has very likely been resolved.
Relevant: rclone/rclone#2042. If rclone gets SMB support, this issue can be worked around, and it will work on all platforms. I’m not volunteering, but if anyone needs a summer project…
I’ll pin this issue for now
That seems indeed to be the case. I’ve encountered a similar (?) issue with Go 1.16.1 on Ubuntu 20.04 running via WSL (Linux-Kernel 4.19.128) with a local samba share (a plain Linux system/VM should work just as well). For testing I’ve created a new repository and backed up a few gigabytes of data. For me the issues so far were slow operations (at least I had that impression) and the silent corruption of a pack file! The pack file ended up with a few 64kb ranges which just contain null bytes instead of the actual data.
So this looks like either a Go or a kernel problem. So far I haven’t had time assemble a more compact test case for reproducing.
The
GODEBUGworkaround helped me fix my backup. I use restic to bacup my NAS to object storage and this problem got me from making backups for months.export GODEBUG=asyncpreemptoff=1