restic: Fatal: number of used blobs is larger than number of available blobs!

Restic Version

restic 0.9.6 compiled with go1.13.4 on linux/amd64

Command(s)

restic prune restic check --read-data-subset=05/53 (the commands are automated as part of a weekly backup maintenance script)

What did you expect?

An error free prune followed by an error free check.

What happened instead?

Both the prune and the check failed.

Output of prune:

counting files in repo
building new index for repo
incomplete pack file (will be removed): 056c60842c58dd17d452aacbd8a3a2a28b7729a55df17862c0a9715205178ded
[5:30:10] 100.00%  148048 / 148048 packs

incomplete pack file (will be removed): 173d2b9e645bd13eb794149b3f6f7b938940272f5a3b10cc0d48b13b61f0f73d
incomplete pack file (will be removed): 25be7e6fc725d5c5354e56133eda461d4fc2189ef90a458db837f1580eedffe6
incomplete pack file (will be removed): 429b728243cd9bf1777fe87eba0717ec3d7e04708692972db539ec168bdf35a2
incomplete pack file (will be removed): 54dcb5d83bd0ddeecc49a73f1e48e817b4cc0d5999a0663103455a0335a6116f
incomplete pack file (will be removed): 581b0af89c5ea7ba0bf47d8cfe36463001c08c46a534263c44c779185eb8ad80
incomplete pack file (will be removed): 5908682c326af00157838c8335e15414c7a7e79b0450148edbfa9d32b6eb2989
incomplete pack file (will be removed): 745354f809468d971a5488567f73461348fdc92468e8cc6140c9a50b53f8a265
incomplete pack file (will be removed): 79f599e46864df4a3c94a7b515939998045d7c4984d12f5733a91324e22774ed
incomplete pack file (will be removed): 7ee35edaa9c22be3855a54f67fff32ab88e1b90c0f65a3c7c27852fd55236292
incomplete pack file (will be removed): 8da7e3200bb309debf0ad338387c26dafc97de1afc7ba3011c0ba1241c6d0d08
incomplete pack file (will be removed): 8f74dab11a8de782759efd25db340326c3524851a7c46a4cb2df23d175546328
incomplete pack file (will be removed): a044002799ae0a4c8b0caeed64a3c1c2314f3bd5652f52e4c92af1f47a1ecc9f
incomplete pack file (will be removed): a60b3a2a76f225256377731107b3889c924413a1db8ac212db59c5f2b6efb87d
incomplete pack file (will be removed): b0404632536e3fa1851066a783ba08d9a871fa45b6037e1471ac8b0cfc3fa34a
incomplete pack file (will be removed): b492fd1c892f2b5e9ed63df629aa084d220730686de252b3030484989254f8c1
incomplete pack file (will be removed): b5bbee347e45875217ad33288e03899cbee0c2fcc0fcc7edb2892eb50ef89990
incomplete pack file (will be removed): c05b64748d945517114f6fede76c95f9336ddf03e13994c5c04f34b7bcd43d95
incomplete pack file (will be removed): ce95f9d896c44eb9da8717276d3959242edff24fcf7f8c79efc2908d34f26645
incomplete pack file (will be removed): d1a46a0406973a86eceed8314470ed6c3a25642ba810c1af3bb2f0e1e275a469
incomplete pack file (will be removed): d2fa90492bac4743f184dc24137247de74ee56afd122e66318c511e39f43f013
incomplete pack file (will be removed): d4c832432f9addf84646adc3b340853b65dcfc97d49d0d74ed9e3bba2a2c7cc9
incomplete pack file (will be removed): d542d0c8a92b5ad1c37169c8fcfdff9ce810df04a2804c9e0e4e8d4db512246b
incomplete pack file (will be removed): e57d166ccd9b367380139b8e563add2e6069af33f7a6f26d68969623e7b493a1
incomplete pack file (will be removed): f4507052a2d20fbea4db14c8c4ce8d1e68af912f3efa77ae42b6c60a015db480
incomplete pack file (will be removed): f51a455d0dbde2fd452bd607f5d1e5b2cf79cca0bca2ae3d6b996ac7bbaeeb19
incomplete pack file (will be removed): f5abb13ab88935727bf60e3032cf28176897bd2f807e53a19c50a2fc6817dd8e
incomplete pack file (will be removed): fad51e77a344f1cf9e38fc7073b9c638d1d407c2d505e6847620d3577084a865
incomplete pack file (will be removed): faf4d42c802b96b95017b492793d174dc14da7a177c40deb049aa16b24c1aae0
incomplete pack file (will be removed): fe735339b138268beb4aadd0ae1e9fa1bcfc79e1b741fd1fb152613e594f66ed
repository contains 148018 packs (826578 blobs) with 671.196 GiB
processed 826578 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 213 snapshots
[0:32] 100.00%  213 / 213 snapshots

Fatal: number of used blobs is larger than number of available blobs!
Please report this error (along with the output of the 'prune' run) at
https://github.com/restic/restic/issues/new

Output of Check

using temporary cache in /tmp/restic-check-cache-061487687
created new cache in /tmp/restic-check-cache-061487687
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read group #5 of 2874 data packs (out of total 148048 packs in 53 groups)
Pack ID does not match, want 39186bde, got 1da9e122
Pack ID does not match, want 04efce75, got a76690ba
[46:52] 100.00%  2874 / 2874 items
Fatal: repository contains errors

Other details

  • Backend: B2
  • This machine spent at least a week with it’s clock set incorrectly after a power failure/reboot. There were several backups, prunes and checks performed while the local machine had a clock that was set to a time roughly 30 days in the past. Backups, purnes and checks since the clock has been fixed have been successful.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (10 by maintainers)

Most upvoted comments

After a multi day exploration using a number of our engineers, we believe we determined the root cause of this issue.

When data is uploaded to our infrastructure, each file is erasure coded across 20 storage servers, which we call a Backblaze Vault. The files identified by @kmwoley and other Restic customers were all stored in vault-1129.

This vault happened to have disk errors on a number of hard drives. Therefore, it was possible for a download to intermittently be corrupt, as reported. However, because of the way files are stored across multiple servers, the file is still correct and available. No data has been lost.

Backblaze has an integrity checker which slowly runs over the disks looking for issues. In fact, by the time engineering started looking into the problem last week, some of the corrupt parts where already detected and moved aside and queued to be rebuilt. This led to taking longer to understand the problem. Going forward, our tools to determine the state of files will be improved. Also, there is a project that will be going live in the next month or so to better detect this issue while reading a file for download.

At this point - we believe the issue has been resolved; vault-1129 is healthy again. If this is not the case, please do let us know by opening a support ticket and referencing this post.

It is a mystery why only restic customers reported this issue to Backblaze and further why the data was stored on vault-1129. (For context, each vault has 1,200 drives and as of Dec. 31, 2020, we have 165,530 drives in production, or ~137 vaults. This was probably just a coincidence.

@kmwoley is correct. We deployed a fix that should finally resolve this issue. Thanks to @kmwoley and everyone who reported the issue. If you continue to see a problem, please file a ticket and reference this issue.

Thanks to everyone who helped diagnose this issue and your patience.

Welp. Bad news. I’m still seeing corruption when I run restic check --read-data… got an error within 10 min of running --read-data.

I’ve sent details to Backblaze support to continue the investigation.

Thanks to @kmwoley for reproducing this issue without restic. Our engineering team has the detail he provided and we are currently investigating.

Hey folks - I can confirm that Backblaze deployed a fix on Feb 26, 2021 (around or before 12:29 PST) that resolved the data reading errors I was experiencing. Many thanks to all of you who helped diagnose the issue and get it in front of Backblaze for attention.

@kmwoley

Is it expected that the check does not show progress for a long period of time (i.e. hours) at this stage?

# restic check --read-data | tee 2021.02.20.log
using temporary cache in /tmp/restic-check-cache-740384393
created new cache in /tmp/restic-check-cache-740384393
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read all data
[1:51] 100.00%  215 / 215 snapshots

I can see from my system monitor that it’s downloading at full-speed and using a core of CPU, and has been for about an hour or more without reporting any progress.

As you pipe the output of check into another command, that is considered as non-interactive usage. And for that case restic doesn’t output any live progress. With restic 0.12.0 it’s now possible to configure that using the RESTIC_PROGRESS_FPS environment variable or by sending restic a SIGUSR1 signal.

Summary

I believe this to be an issue independent of Restic. I have downloaded one pack directly from B2’s web UI that resulted in a checksum mismatch. I have run 2 full data checks from different machines which appear to give random checksum results. Every single manual pack checksum check I have done passes, except for the one documented below.

Manual Checksum Failure - 49791035

In the full --read-data checks, there were 5 packs that showed in error across the two checks. I manually verified each of those packs by downloading the pack from B2’s Web UI. One resulted in a corrupt file for the first time I downloaded the file (on Windows, Chrome). On second download, the file was correct.

Here’s the SHA-1 results (to compare against the Backblaze UI): image

Binary Diff

I was curious if his was a single bit flip, or something else. So I put the corrupt file and the correct file into a binary comparison tool. The result was that about 12% of the files differed… in 5 different locations… not just the start or end.

image

I also downloaded the file through the UI 5 more times… each of those subsequent downloads through the UI was correct.

Checking the single file in a loop

I’m currently running the download / shasum of the single file in a loop - I’ll report back on what I find there.

Summary of --read-data testing

  • I have run two full data checks from two different machines using Restic 0.11. The detailed results are posted below.
  • The rate of errors is different between the two machines. The Linux machine produces ~ 0.84 errors/hour of running check. The Windows machine produced ~0.56 errors/hour.
  • Of the 67 packs in error total (out of 148050), only 5 of the same packs showed up as in error from both machines.

Full Check Results

Windows Machine

Windows 10, Restic 0.11, Wireless Network

PS C:\restic> .\restic check --read-data
using temporary cache in C:\Users\Kevin\AppData\Local\Temp\restic-check-cache-954866003
repository e07618ce opened successfully, password is correct
created new cache in C:\Users\Kevin\AppData\Local\Temp\restic-check-cache-954866003
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read all data
Pack ID does not match, want 8ddad374, got b4237d17
Pack ID does not match, want f5ef3b30, got 957945b4
Pack ID does not match, want 76512646, got d983233e
Pack ID does not match, want eefb18c5, got 28e6b164
Pack ID does not match, want 202597c6, got 6976c331
Pack ID does not match, want a0047d1a, got 3135cc61
Pack ID does not match, want f72d22f6, got bd0050a0
Pack ID does not match, want f37288d6, got 1caab1ba
Pack ID does not match, want 581b0af8, got 05113ec4
Pack ID does not match, want 8e21bea4, got f6ace65a
Pack ID does not match, want 27b07e9e, got 98c3ded8
Pack ID does not match, want 81870b82, got cdee9d5d
Pack ID does not match, want eb228e74, got 736f0e8f
Pack ID does not match, want af8d336b, got 3e076569
Pack ID does not match, want 99fe9e94, got 70e33c03
Pack ID does not match, want 02080be7, got b2514977
Pack ID does not match, want 4ac9361f, got 5fde1c78
Pack ID does not match, want 7cd68700, got ea422a27
Pack ID does not match, want ba75213e, got be695407
Pack ID does not match, want 49791035, got a1baa971
Pack ID does not match, want ca8e84cc, got a0cee332
Pack ID does not match, want f900df97, got 2960de63
Pack ID does not match, want 6db65d0a, got 2c4792af
Pack ID does not match, want 353ae954, got 87130773
Pack ID does not match, want 1fb4983c, got 4a49100a
Pack ID does not match, want ce95f9d8, got 4ec61790
Pack ID does not match, want 31b6ede2, got c7f7b114
[48:20:01] 100.00%  148050 / 148050 packs
Fatal: repository contains errors
PS C:\restic>

Linux Machine

Unraid Linux (backup host). Restic 0.11. Wired network.

using temporary cache in /tmp/restic-check-cache-339533226
repository e07618ce opened successfully, password is correct
created new cache in /tmp/restic-check-cache-339533226
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read all data
Pack ID does not match, want c29bc1a4, got b9d5d04b
Pack ID does not match, want bd505916, got 9861462a
Pack ID does not match, want 176ccc63, got 8a69d9fd
Pack ID does not match, want 287d3c84, got 7c6bee66
Pack ID does not match, want 4ac9361f, got 4c2b16cd
Pack ID does not match, want 316d4bf5, got 5ede609f
Pack ID does not match, want df72f5a1, got 6aa414b7
Pack ID does not match, want ad74edf2, got 03632064
Pack ID does not match, want ba2b9894, got 63c0f8cb
Pack ID does not match, want f4f93c0e, got 09f69c2f
Pack ID does not match, want 082c2dcd, got bf4f50d7
Pack ID does not match, want aeadd833, got b21a1f58
Pack ID does not match, want 4199ff0a, got 574624b6
Pack ID does not match, want f5ef3b30, got dff3c6a0
Pack ID does not match, want 35f93620, got f18216a5
Pack ID does not match, want 27b07e9e, got 735d923f
Pack ID does not match, want 1dfbc37c, got d32bdfe4
Pack ID does not match, want 429b7282, got 54890386
Pack ID does not match, want 7cd68700, got cbf88637
Pack ID does not match, want 7ef7ad06, got fde1949d
Pack ID does not match, want e15cdbeb, got 16621489
Pack ID does not match, want 119cd505, got de561fff
Pack ID does not match, want 6f0f9d33, got 537b8924
Pack ID does not match, want 864f3137, got 37590313
Pack ID does not match, want 97bccb43, got cf730317
Pack ID does not match, want aad302e6, got 29f03512
Pack ID does not match, want aab71342, got 24eec66d
Pack ID does not match, want 47336b30, got 8f871085
Pack ID does not match, want 5adc97ac, got 6c59e41a
Pack ID does not match, want 49791035, got 36db95e9
Pack ID does not match, want 201bccd6, got bd0c9dd6
Pack ID does not match, want aa03c2b8, got 4a52996d
Pack ID does not match, want 79f599e4, got 1d5c3fa6
Pack ID does not match, want a463af1a, got b9cc48da
Pack ID does not match, want 83cf9760, got bfc67dca
Pack ID does not match, want 66a803f6, got fb373c39
Pack ID does not match, want c51332bd, got 3be73f0b
Pack ID does not match, want 313d8b37, got 6b2d8184
Pack ID does not match, want 2ebd7b8f, got c40c820f
Pack ID does not match, want eb9c1d4a, got 15322491
[47:54:31] 100.00%  148050 / 148050 packs
Fatal: repository contains errors

Packs with errors from both runs

Want Got Env Size (KB)
49791035 a1baa971 Windows 5,182
49791035 36db95e9 Linux 5,182
27b07e9e 98c3ded8 Windows 4,859
27b07e9e 735d923f Linux 4,859
4ac9361f 5fde1c78 Windows 4,711
4ac9361f 4c2b16cd Linux 4,711
7cd68700 ea422a27 Windows 4,215
7cd68700 cbf88637 Linux 4,215
f5ef3b30 957945b4 Windows 4,300
f5ef3b30 dff3c6a0 Linux 4,300

I downloaded each of these packs from B2 using the GUI to validate their checksums. All of them passed, except 49791035 as noted above.

Other notes

  • I recently had upgraded my router firmware. I downgraded it to the previous version and ran a --read-data check to see if it made a difference. It did not - I still got errors. 😕