restic: restic backup fails to save blobs and renew locks during backup
Output of restic version
restic 0.11.0 compiled with go1.15.3 on linux/amd64
How did you run restic exactly?
$ restic backup -x --tag tag1 --tag tag2 --tag tag3 dir1 dir2 dirprefix*
repository 41367ccf opened successfully, password is correct
created new cache in redacted/.cache/restic
Save(<data/f6d39f650f>) returned error, retrying after 738.458435ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/c115994973>) returned error, retrying after 287.145499ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/5a0b25cb1e>) returned error, retrying after 361.144708ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/17b2c54475>) returned error, retrying after 590.539156ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/f354e5187b>) returned error, retrying after 370.757544ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/3f56ada95c>) returned error, retrying after 615.115739ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/4e129fb137>) returned error, retrying after 341.462458ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/93872cbf61>) returned error, retrying after 591.326744ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/e76945d4a8>) returned error, retrying after 739.464678ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/47c56f37ec>) returned error, retrying after 283.564177ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/7444d335c8>) returned error, retrying after 575.230153ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/44d7613d65>) returned error, retrying after 514.307064ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/17d3d5a19b>) returned error, retrying after 619.667246ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/2ac1628a6f>) returned error, retrying after 713.381054ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/b9d73f1a2e>) returned error, retrying after 575.126373ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/f1893a13a9>) returned error, retrying after 491.58676ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/80c4416bb1>) returned error, retrying after 391.203249ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/2f94e1d972>) returned error, retrying after 570.57463ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
Save(<data/85af09c76c>) returned error, retrying after 366.273224ms: client.PutObject: Your socket connection to the server was not read from or written to within the timeout period.
[122:02:21] 83.05% 11565236 files 7.909 TiB, total 14151160 files 9.524 TiB, 0 errors ETA 24:54:31
What backend/server/service did you use to store the repository?
s3 swiftstack (less than 2ms away)
Expected behavior
Reliable and predictable backup/upload/restore behavior or a loud failure as early as possible.
Ideally restic backup should retry saving the blob until it succeeds, possibly followed by a read request to verify that each of the small fraction of problematic blobs exists on the backend. User should be able to specify the cound and duration of upload retries. Once the retry counter is reached backup should exit right away with a non-zero exit code.
Actual behavior
Warning message is logged, as above, and backup continues, possibly for days, while the blob has never reached the backend.
minio-mc find backend/bucket | grep data/17d3d5a19b
returns no entries
Steps to reproduce the behavior
- start doing backups of sufficient volume
- wait for error message or simulate network congestion/disruptions between restic and the backend
- watch restic log the error and continue as if nothing has happened, possibly for days
Do you have any idea what may have caused this?
insufficient error handling?
Do you have an idea how to solve the issue?
Pause/queue processing on upload errors until they are resolved and validated or fail the backup right there and then according to user specified acceptable retry count / time.
Did restic help you today? Did it make you happy in any way?
My first impression is this bug report. There is hope, but not yet
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 27 (10 by maintainers)
I can understand that. Despite any bugs we are not aware of, restic compiled from master is fully compatible with 0.11.0 and you should be able to go back and forth between the versions.