bupstash: Data corruption error - (possibly from stale caches)
I was doing a test to see if bupstash would work for me, and my restore command simply hangs partway through. The timer keeps counting and it says “fetching files…”, but nothing happens.
I must have gotten exceptionally (un)lucky on my first try. I ran a script on a tmpfs generating, backing up, and restoring 1GiB of random data at a time for over a terabyte of data, and it never happened again.
I was able to do a manual binary search on the original failure to whittle the 7GiB down to just 5 data files with about 3MiB total.
Here’s the backup and a script to reproduce, I’ve tested it on multiple machines and it fails every time: deadlock-repro.tar.gz
# After downloading deadlock-repro.tar.gz
tar -xzf deadlock-repro.tar.gz
cd bupstash-deadlock-repro
bupstash restore -k test.key -r backup --into restore name=test
Running strace
on it shows it waiting on a futex
call.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 20 (13 by maintainers)
I’m mostly convinced that my issues all stem from stale caches, but I’ll continue testing next week to see if I can reproduce any issues without stale caches. I think most people would expect stale caches to not cause problems, but with bupstash’s security model I think a
put
sub-key actually has no way to tell if the cache is valid, so that makes things a bit more difficult.Hopefully just checking the inode will be enough to prevent people from shooting themselves in the foot, but maybe something should be added to the documentation as well? Another possible issue with that is filesystems like sshfs, where the inode numbers could change each time it’s mounted, although the documentation does already make it clear that you shouldn’t use sshfs to mount a bupstash repository, and the only real issue there would be that the cache is useless, which wouldn’t cause any corruption, just slower backups.
I will make sure the cache invalidates if the inode of the repository changes - this should prevent this from ever being an issue. We can keep investigating if corruption happens again.