restic: Prune operation fails repeatedly (B2 Bucket)

Output of restic version

/ # restic version
restic 0.9.5 compiled with go1.12.4 on linux/amd64
/ # 

How did you run restic exactly?

restic prune --no-cache --no-lock --limit-upload 5000 --limit-download 10000 --cache-dir /cache -vvvv

Run using a docker container with the --hostname flag on backups to supply the correct hostname.

Restic is not being run in parallel, the backup operation is entirely manual.

What backend/server/service did you use to store the repository?

I use a B2 Bucket to store the backup repository

Expected behavior

Restic Prunes unneeded backups and files as well as doing a dedup. It does not use locks to block the repository.

Actual behavior

Restic fails partway into the process, failing to free over 5TB of data due to a “already locked” error. This occurs despite supplying the --no-lock flag and occurs regardless whether or not this flag is present. The repository becomes locked up until restic unlock is run to manually clear the locks.

counting files in repo,
building new index for repo,
[63:14:32] 100.00%  1661186 / 1661186 packs,
,
repository contains 1661186 packs (10662314 blobs) with 7.803 TiB,
processed 10662314 blobs: 844029 duplicate blobs, 280.468 GiB duplicate,
load all snapshots,
find data that is still in use for 3 snapshots,
[17:23:28] 100.00%  3 / 3 snapshots,
,
found 3228835 of 10662314 data blobs still in use, removing 7433479 blobs,
will remove 0 invalid files,
will delete 1152764 packs and rewrite 73168 packs, this frees 5.845 TiB,
[59:38:23] 100.00%  73168 / 73168 packs rewritten,
,
counting files in repo,
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 1 on 4d5e617a29db by root (UID 0, GID 0),
lock was created at 2019-11-11 17:15:36 (152h40m7.829255086s ago),
storage ID 50eb6df7,

Steps to reproduce the behavior

  1. Create a large B2 Bucket backup repository
  2. Run restic prune (the operation must take over 24 hours and a DSL line might be required)
  3. Wait for the error to occur

Do you have any idea what may have caused this?

The exclusive lock restic creates during the operation is not properly deleted. This either occurs because B2 does not immediately delete the lock or because during the operation the internet connection is renewed multiple times (DSL internet connection) which prevents restic from properly cleaning up the locks.

Additionally this is caused by restic using locks when instructed not to create locks via --no-lock

Do you have an idea how to solve the issue?

Restic should obey the --no-lock flag and not use locks during the prune operation.

Restic should recognize locks that is has created previously during the current operation and ignore them or try to remove these stale locks.

Restic might be failing the unlock silently in some situations, the unlock should verify if the lock was properly deleted and if not, retry the unlock and log this situation.

Did restic help you or made you happy in any way?

Restic makes me very happy to be able to run my backups, it’s just getting expensive due to being unable to run prune.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 26 (7 by maintainers)

Most upvoted comments

same problem with the b2 repository .

issue solved by generating new master application keys at b2