restic: Restic backup — OOM-killed on raspberry pi after backing up another computer to same repo

On a raspberry pi 3 model B, which has 1GB of memory, restic stopped fitting into RAM after a 3TB backup was done on another machine to same repository. I suspect indexes have grown big enough to not fit into memory.

Output of restic version

hmage@phmd:~$ restic version
restic 0.9.2 compiled with go1.10.3 on linux/arm

How did you run restic exactly?

export AWS_ACCESS_KEY_ID=secret
export AWS_SECRET_ACCESS_KEY=secret
export RESTIC_PASSWORD=secret
export RESTIC_REPOSITORY=s3:https://s3.wasabisys.com/restic-hmage
source ./restic-excludes.sh
restic backup --exclude-file <(printf "%s\n" "${EXCLUDES[@]}") /`

Contents of restic-excludes.sh:

EXCLUDES=(
/dev
/proc
/run
/sys
)
EXCLUDES+=(
$'Icon\r'
$HOME/.bundle
$HOME/.cache
$HOME/.cargo
$HOME/.ccache*
$HOME/.config/chromium
$HOME/.cpan
$HOME/.dropbox
$HOME/.local/share/akonadi
$HOME/.npm
$HOME/Library/Application\ Support/Google/Chrome
$HOME/Library/Application\ Support/Telegram\ Desktop
$HOME/Library/Arq/Cache.noindex
$HOME/norm.*
**/var/cache/apt
**/var/cache/man
**/var/lib/apt/lists
**/var/lib/mlocate
.DS_Store
.DocumentRevisions-V100
.Spotlight-V100
.Trashes
.bzvol
.cache
.dropbox.cache
.fseventsd
/Volumes/Time\ Machine
/media/psf
/private/var/vm/
/srv/piwik/tmp
/srv/www/data/cache
/tmp
/usr/lib/debug
/var/lib/lxcfs
/var/swap
/var/tmp
Cache
Caches
)

What backend/server/service did you use to store the repository?

Wasabi (S3 protocol)

Expected behavior

Restic should not run out of memory no matter how big the indexes are — they should be streamed from disk/repo rather than loaded completely into RAM since it’s not infinite.

Actual behavior

Restic allocates tons of memory depending of index sizes — before backing up 3TB of data on my mac, restic on Pi had no problems backing up, after successful backup, restic gets killed by kernel’s oom-killer:

hmage@phmd:~$ dmesg|fgrep restic
[426681.565821] [15683]  1000 15683     1393      295       8       0        0             0 restic-backup.s
[426681.565827] [15709]  1000 15709   268547   174432     353       0        0             0 restic
[426681.565897] Out of memory: Kill process 15709 (restic) score 664 or sacrifice child
[426681.565959] Killed process 15709 (restic) total-vm:1074188kB, anon-rss:697728kB, file-rss:0kB, shmem-rss:0kB
[426681.766777] oom_reaper: reaped process 15709 (restic), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[511937.088184] [17227]  1000 17227     1393      295       6       0        0             0 restic-backup.s
[511937.088190] [17255]  1000 17255   267651   176005     357       0        0             0 restic
[511937.088205] Out of memory: Kill process 17255 (restic) score 670 or sacrifice child
[511937.088266] Killed process 17255 (restic) total-vm:1070604kB, anon-rss:704020kB, file-rss:0kB, shmem-rss:0kB
[511937.324251] oom_reaper: reaped process 17255 (restic), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[598842.337688] [25297]  1000 25297     1393      295       5       0        0             0 restic-backup.s
[598842.337695] [25324]  1000 25324   201281   129430     264       0        0             0 restic
[598842.337735] Out of memory: Kill process 25324 (restic) score 493 or sacrifice child
[598842.337793] Killed process 25324 (restic) total-vm:805124kB, anon-rss:517720kB, file-rss:0kB, shmem-rss:0kB
[598842.529990] oom_reaper: reaped process 25324 (restic), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[642004.263182] [25392]  1000 25392     1409      314       5       0        0             0 restic-backup.s
[642004.263188] [25412]  1000 25412   201122   123536     252       0        0             0 restic
[642004.263252] Out of memory: Kill process 25412 (restic) score 470 or sacrifice child
[642004.263305] Killed process 25412 (restic) total-vm:804488kB, anon-rss:494144kB, file-rss:0kB, shmem-rss:0kB
[642004.409938] oom_reaper: reaped process 25412 (restic), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Steps to reproduce the behavior

  1. backup on Pi into empty repo anywhere — will succeed
  2. backup from another machine 3TB of data with 2 million files, which takes about several attempts and about 20 hours.
  3. backup on Pi again into same repo — will OOM

Do you have any idea what may have caused this?

Restic tries to load all indexes into RAM instead of mmaping them.

Do you have an idea how to solve the issue?

mmap the indexes from disk cache

Did restic help you or made you happy in any way?

So far best backup solution, and only backup solution that allows deduplicate backups from across multiple machines into single repo.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 10
  • Comments: 22 (7 by maintainers)

Commits related to this issue

Most upvoted comments

To tackle the problem I’d like to summarize the technical details and also start a discussion about possible solutions.

Cause of high memory usage

  • Basically the main problem with most commands is the index hold by internal/repository that holds the information of all index files in memory. The memory consumption thus grows (approximately) linear with the number of blobs in the repository.
  • Number of blobs in the repository can grow hugely for several reasons, e.g. high number of (small) files, generally high backup volume (much data) or backup from many computers to one repository.
  • The commands rebuild-index and prune additionally use an in-memory index built in internal/index which makes the memory situation even worse (besides performance problems with these commands).

So I see several approaches, we can take into consideration:

rebuild-index and prune

About the rebuild-index and prune a re-engineering should be done using the index from ‘internal/repository’ and thus removing the double in-memory index creation. A starting point could be PR #2513 which just uses ‘repository/index’ and already adds a prune-like capability. A capability to compact packs (maybe ‘compack-packs’) is missing but should be also able to implement just using ‘repository/index’. Also a repair functionality for missing index data can IMO be done without the ‘internal/index’ functionality.

Reduce memory usage per blob

About the issue with high-memory-usage of many blobs, I see several posibilites (most of which have already been discussed):

  1. With PR #2338 there is already a proposal optimizing for the case that blobs are usually only stored in one pack.
  2. I would like to generally discuss if we can omit the information about more than one pack per blob and just use the first pack.
  3. Going more aggressive could mean to just hold the full information for tree blobs in memory. For data I would suggest to only save the information that a blob is stored. If the information “where” is needed for a specific data blob reread all index files until the blob is found. This approach would not affect most commands but slow down the commands which actually need to read data blobs like ‘restore’.
  4. Maybe not storing the complete index at all and instead reading all index files for every index operation is also an alternative. Maybe add a bounded in-memory cache for blob already read. (no idea about performance impacts of this proposal…) This is similar to storage-based index, see below.

Use storage-based index

This approach can be seen as an alternative to reducing the memory used per blob. Here instead of writing the complete index to an in-memory data structure the index is saved in a database on disc. I think a simple key/value storage would make the job. There are some implementations out, e.g.:

Backup from many computers to one repository

In order to allow many computers to backup to one repository without increasing I would like to propose the following idea:

  • Each computer uses its own list of index files
  • That means the index is not build by reading all index files but instead only the one needed for this computer / the specific restic command
  • The list of index files has to be stored somewhere. Possibilities are: a) extra path in the repository layout to keep track of computers b) save the list of used index files in the snapshot file.
  • This implies a change in the repository data structure
  • This will also most likely “kill” deduplication between computers
  • To gain again deduplication, index rebuilds and pack deletion/reorganization has to be done using all index files, e.g. on a separate computer with large memory.

Separate index database from computer backing up

This idea implies to separate restic in two parts which can be run on separate hardware. The “storage backend” part would be in duty to manage index files and actually read from and write to the storage. This can be on a computer with large memory. The “frontend” part would just use an API to get the information if blobs are existing or to read blobs and write blobs, respectively. This is similar to how borg is working. As @fd0 already pointed out, this would be a complete redesign of the restic architecture.

Discussion

Are there any ideas I am missing? What do you think about the proposals? Which direction is the most promising one to start with? Are there ideas which should be better discussed in a separate issue?

Looking forward to an open and fruitful discussion!

@pzmarzly I’m sorry if this sounds blunt, but I’m going to be direct here 😃

While I appreciate the suggestion (I’m familiar with what mmap does and what it’s used for), it feels to me that you don’t have the full picture, so a solution is applied before understanding what’s going on exactly. In this case, mmap will not improve the situation at all, in addition to the problems with using mmap in Go you listed yourself.

restic works roughly as follows:

  • It reads index files from the disc. We’ve limited the size of the index files to limit the amount of memory that is used.
  • It verifies the integrity, checks the signature, and decrypts the content (also into memory)
  • The JSON data contained in the file is parsed and its content is inserted into an in-memory data structure which is used for lookups during runtime

So mmap would only help for the very first step of this process. At a minimum, decryption and parsing will still need to happen in memory.

All steps described above use memory only for a short amount of time, then the Go garbage collector claims and frees it again. The culprit here is the data structure in memory, which keeps the data in a way that lets it grow very large. That needs to be addressed, everything else will just slightly improve the symptoms.

In addition, it feels to me that something with the way we use our data structures makes it very hard for the runtime’s garbage collector to actually free memory, I haven’t had the time to dig into this yet.

I’ve started hitting this issue with a number of machines that are much larger than a Raspberry Pi. Loading all of the indexes of my backup set requires about 14GB of free RAM at the moment.

I’ve got about 30TB backed up from about 20 machines. Only 3 of the machines have large backup sets (each in the 10TB range). The others are much smaller (each in the 100GB range). Each individual index file is quite small (<7MB). The total size of all index files in my repository is about 7700MB.

I know you’re already aware of the issue, and that you’re working on it. I just wanted to post this in so that other users can find it when they ask about scaling issues.

Also, thank you so much for restic! It is the best backup program I’ve ever used.

You discussed possible solutions up above, but here is another one to consider: running the file scanner part (lstat() and reading the file contents) on the machine being backed up, and running the part that holds the indexes in RAM on a big server that has tons of disk space and RAM available. That would mean that the machines being backed up wouldn’t need lots of RAM, and also wouldn’t need lots of disk space (to store cached versions of the index files).

I’ve recently started using Restic. Once I read the design document I was stunned by the beauty of its simplicity. I really want my restore/backup tool to be as simple as possible - it makes it much easier to trust it and reason about it. Thank you very much for creating such a beautiful thing.

The following is strictly my opinion, please correct me if I’m wrong.

Scaling issues

I really appreciate that Restic is small and works out of the box for simple use cases. However, it’s evident that it doesn’t scale for large number of blobs. I’m definitely interested in making Restic use fixed amount of memory for all operations.

Thank you @aawsome for the great summary of the current scaling issues.

Indices are the biggest one, but there are others I saw:

  • a directory tree blob is composed and encrypted in memory regardless of number of files in it #2446
  • REST server does not support paging and may perform unbounded amount of work per request #2421

I think we should keep them in mind while discussing scaling solutions but fix them one-by-one.

Scaling and scoping strategy

IMO we really need guidance from the core maintainers here:

  • making Restic scale will inevitably increase its complexity. Some users require scaling while others would prefer simplicity. This feels like 2 different products to me.
  • I got the impression that self-tuning is preferred to allowing users to configure all the things.
  • I saw in many threads that Restic team wants to limit scope of what Restic does e.g. no backup scheduling, no additional backends, no GUI I fully support that decision, Unix way FTW! It would be great if there is clear guidance on what belongs in Restic core and what should be separate. Is scaling to arbitrary number of blobs and any repo size in scope?

Data collection

I understand that it’s hard to make strategic decisions in absense of usage data. It’d be good if we can conduct a census of existing Restic repos. Maybe add a command that sends anonymous stats about backups, restores and repository to the team?

Information I’d like to see:

  • repo size on disk
  • number of blobs: tree & data
  • number of different machines backing up to the repository (is that information even stored anywhere?)
  • history of how many new files were backed up and how many new blobs were created
  • history of memory consumption during backup and restore
  • max number of files in directory
  • max number of blobs in a file
  • max file size
  • histogram of file and blob sizes in a repo

Obviously collection of this stats should be entirely optional, but it will help to make informed decisions.

Storage-based Index

Storage-based index (whether local or remote) seems like a good long term solution to me.

As I looked at the code, Index interface is very narrow (which is good!):

type Index interface {
	Has(ID, BlobType) bool
	Lookup(ID, BlobType) ([]PackedBlob, bool)
	Count(BlobType) uint

	Each(ctx context.Context) <-chan PackedBlob
}

I didn’t find any usages of Count outside tests, so we need only point-lookup and iteration over all elements. Storage index needs to be encrypted to provide the same guarantees as existing indices.

As a proof of concept of scalable backup command with minimal changes to current code:

  • continue to fetch index files from the backend
  • provide Index implementation that reads index files and stores index entries in the local database instead of memory
  • use the database for lookups - let it handle the caching and what’s kept in memory or on disk.
  • for simplicity delete the database after restic command finishes
  • configure the database to use fixed amount of memory

Hopefully users can use this POC and provide feedback on its performance.

As another potential storage solution, I suggest to consider Badger. It’s written in go, embeddable, allows to configure memory usage and supports encryption at rest.

Long term I think it’d be great if we can expose Index as a gRPC or REST API and allow plugging in different implementations. Then the complexity and maintenance costs can live outside Restic core.

TL;DR scaling requires trading off simplicity, scaling strategy is needed, let’s build a storage-based index prototype

I’m really excited about this project and happy to discuss this further.

You can do that, please let me know if you find anything interesting. I already know that there’s plenty of memory to be garbage collected (I’ve done some tests myself), but I don’t understand why the runtime still seems to use more. So, if you like to dig in, please do so! I even have a sample repo you can use, which consists of a few hundred snapshots of Linux kernel source code (so plenty of small files), the repo size is around 50GiB. Please let me know if you’d like to get it.

I had the same issue. I initially ran a very large backup locally on the backup machine itself so that it would complete faster. I then tried to run the same backup (which would be incremental) from an odroid with about 2GB of RAM and ran out of memory in about 30 seconds. As a workaround, I just created a swapfile. Slows things down a bit, but got me past the memory issue.

# run as root
fallocate -l 8G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile