restic: runtime: out of memory: cannot allocate 536870912-byte block (556793856 in use)

Likely supersedes #526

Output of restic version

# restic.debug version
debug enabled
restic 0.6.0-rc.1 (v0.6.0-rc.1-1-g37b107b)
compiled with go1.8.3 on linux/arm

How did you start restic exactly? (Include the complete command line)

# /usr/local/sbin/restic.debug -r rest:http://qnap.ch.polyware.nl:8000 backup -x --tag CH / /boot --exclude "/var/log/*,/var/tmp/*,/tmp/*" --mem-profile=/tmp

What backend/server/service did you use?

rest

Expected behavior

Making a backup.

Actual behavior

Failed backup. out of memory.

# /usr/local/sbin/restic.debug -r rest:http://qnap.ch.polyware.nl:8000 backup -x --tag CH / /boot --exclude "/var/log/*,/var/tmp/*,/tmp/*" --mem-profile=/tmp
debug enabled
enter password for repository: 
runtime: out of memory: cannot allocate 536870912-byte block (556793856 in use)
fatal error: out of memory

runtime stack:
runtime.throw(0x4c85ae, 0xd)
	/usr/local/go/src/runtime/panic.go:596 +0x70
runtime.largeAlloc(0x1ffffe00, 0x6ae01, 0x711f50)
	/usr/local/go/src/runtime/malloc.go:809 +0xe0
runtime.mallocgc.func1()
	/usr/local/go/src/runtime/malloc.go:702 +0x30
runtime.systemstack(0x10a1ca80)
	/usr/local/go/src/runtime/asm_arm.s:264 +0x80
runtime.mstart()
	/usr/local/go/src/runtime/proc.go:1132

goroutine 117 [running]:
runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_arm.s:209 +0x4 fp=0x10ba7a7c sp=0x10ba7a78
runtime.mallocgc(0x1ffffe00, 0x44ea60, 0x10b66e01, 0x10adab18)
	/usr/local/go/src/runtime/malloc.go:703 +0x9e0 fp=0x10ba7ad8 sp=0x10ba7a7c
runtime.makeslice(0x44ea60, 0x1ffffe00, 0x1ffffe00, 0x960, 0x0, 0x0)
	/usr/local/go/src/runtime/slice.go:54 +0x68 fp=0x10ba7afc sp=0x10ba7ad8
bytes.makeSlice(0x1ffffe00, 0x0, 0x0, 0x0)
	/usr/local/go/src/bytes/buffer.go:201 +0x58 fp=0x10ba7b18 sp=0x10ba7afc
bytes.(*Buffer).ReadFrom(0x10ba7b74, 0x6e16f0, 0x10aa3fc0, 0x200, 0x0, 0x10a80400, 0x0)
	/usr/local/go/src/bytes/buffer.go:173 +0x270 fp=0x10ba7b50 sp=0x10ba7b18
io/ioutil.readAll(0x6e16f0, 0x10aa3fc0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/ioutil/ioutil.go:33 +0xd0 fp=0x10ba7bc8 sp=0x10ba7b50
io/ioutil.ReadAll(0x6e16f0, 0x10aa3fc0, 0x10aa3fc0, 0x6e16f0, 0x10aa3fc0, 0x0, 0x0)
	/usr/local/go/src/io/ioutil/ioutil.go:42 +0x34 fp=0x10ba7bf0 sp=0x10ba7bc8
restic/backend.LoadAll(0x6e5fe0, 0x10ae0150, 0x4c1338, 0x5, 0x10b2dcc0, 0x40, 0x0, 0x0, 0x0, 0x0, ...)
	/tmp/restic-build-847077805/src/restic/backend/utils.go:28 +0x104 fp=0x10ba7c2c sp=0x10ba7bf0
restic/repository.(*Repository).LoadAndDecrypt(0x10adc180, 0x4c1338, 0x5, 0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, ...)
	/tmp/restic-build-847077805/src/restic/repository/repository.go:57 +0x120 fp=0x10ba7d50 sp=0x10ba7c2c
restic/repository.LoadIndexWithDecoder(0x6e6e20, 0x10adc180, 0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, 0x8d242524, ...)
	/tmp/restic-build-847077805/src/restic/repository/index.go:525 +0xc8 fp=0x10ba7de0 sp=0x10ba7d50
restic/repository.LoadIndex(0x6e6e20, 0x10adc180, 0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, 0x8d242524, ...)
	/tmp/restic-build-847077805/src/restic/repository/repository.go:352 +0x48 fp=0x10ba7e34 sp=0x10ba7de0
restic/repository.(*Repository).LoadIndex.func1(0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, 0x8d242524, 0x10b62080, 0x8d242524, ...)
	/tmp/restic-build-847077805/src/restic/repository/repository.go:320 +0x4c fp=0x10ba7ec0 sp=0x10ba7e34
restic/repository.ParallelWorkFuncParseID.func1(0x10b622c0, 0x40, 0x10b62080, 0x10b4ff00, 0x1)
	/tmp/restic-build-847077805/src/restic/repository/parallel.go:89 +0x148 fp=0x10ba7f54 sp=0x10ba7ec0
restic/repository.FilesInParallel.func1(0x10b689c0, 0x10b620c0, 0x10b34820, 0x10b62080, 0x10b62380)
	/tmp/restic-build-847077805/src/restic/repository/parallel.go:53 +0xd4 fp=0x10ba7fd4 sp=0x10ba7f54
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm.s:1017 +0x4 fp=0x10ba7fd4 sp=0x10ba7fd4
created by restic/repository.FilesInParallel
	/tmp/restic-build-847077805/src/restic/repository/parallel.go:63 +0x130

goroutine 1 [chan receive]:
restic/repository.(*Repository).LoadIndex(0x10adc180, 0x4e9798, 0x10b2c3c0)
	/tmp/restic-build-847077805/src/restic/repository/repository.go:339 +0x124
main.runBackup(0x0, 0x0, 0x0, 0x10ad8a80, 0x3, 0x3, 0x0, 0x0, 0x0, 0x1, ...)
	/tmp/restic-build-847077805/src/cmds/restic/cmd_backup.go:375 +0x330
main.glob..func2(0x70d1b8, 0x10b397c0, 0x2, 0xa, 0x0, 0x0)
	/tmp/restic-build-847077805/src/cmds/restic/cmd_backup.go:38 +0xcc
github.com/spf13/cobra.(*Command).execute(0x70d1b8, 0x10b396d0, 0xa, 0xa, 0x70d1b8, 0x10b396d0)
	/tmp/restic-build-847077805/src/github.com/spf13/cobra/command.go:631 +0x308
github.com/spf13/cobra.(*Command).ExecuteC(0x70cf88, 0x10ade2b8, 0x44e820, 0x10b34180)
	/tmp/restic-build-847077805/src/github.com/spf13/cobra/command.go:710 +0x220
github.com/spf13/cobra.(*Command).Execute(0x70cf88, 0x9, 0x10b1bf74)
	/tmp/restic-build-847077805/src/github.com/spf13/cobra/command.go:669 +0x1c
main.main()
	/tmp/restic-build-847077805/src/cmds/restic/main.go:60 +0x88

goroutine 5 [syscall]:
os/signal.signal_recv(0x0)
	/usr/local/go/src/runtime/sigqueue.go:116 +0x154
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:22 +0x14
created by os/signal.init.1
	/usr/local/go/src/os/signal/signal_unix.go:28 +0x30

goroutine 6 [chan receive]:
restic.init.1.func1.1()
	/tmp/restic-build-847077805/src/restic/lock.go:265 +0x9c
created by restic.init.1.func1
	/tmp/restic-build-847077805/src/restic/lock.go:268 +0x24

goroutine 7 [select, locked to thread]:
runtime.gopark(0x4e9cdc, 0x0, 0x4c27d7, 0x6, 0x18, 0x2)
	/usr/local/go/src/runtime/proc.go:271 +0xfc
runtime.selectgoImpl(0x10a26fa0, 0x0, 0xc)
	/usr/local/go/src/runtime/select.go:423 +0x116c
runtime.selectgo(0x10a26fa0)
	/usr/local/go/src/runtime/select.go:238 +0x10
runtime.ensureSigM.func1()
	/usr/local/go/src/runtime/signal_unix.go:434 +0x2bc
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm.s:1017 +0x4

goroutine 8 [chan receive]:
restic.init.2.func1(0x10a12440)
	/tmp/restic-build-847077805/src/restic/progress_unix.go:17 +0x38
created by restic.init.2
	/tmp/restic-build-847077805/src/restic/progress_unix.go:21 +0x90

goroutine 17 [chan receive]:
main.CleanupHandler(0x10ad6040)
	/tmp/restic-build-847077805/src/cmds/restic/cleanup.go:62 +0x38
created by main.init.1
	/tmp/restic-build-847077805/src/cmds/restic/cleanup.go:25 +0x90

goroutine 18 [chan receive]:
github.com/pkg/profile.Start.func7(0x10ad8ae0)
	/tmp/restic-build-847077805/src/github.com/pkg/profile/profile.go:206 +0x90
created by github.com/pkg/profile.Start
	/tmp/restic-build-847077805/src/github.com/pkg/profile/profile.go:212 +0x18c

goroutine 33 [select]:
net/http.(*persistConn).readLoop(0x10a7e280)
	/usr/local/go/src/net/http/transport.go:1599 +0x7b4
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1117 +0x814

goroutine 34 [select]:
net/http.(*persistConn).writeLoop(0x10a7e280)
	/usr/local/go/src/net/http/transport.go:1704 +0x394
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1118 +0x830

goroutine 105 [IO wait]:
net.runtime_pollWait(0x76ee4088, 0x72, 0x10b57000)
	/usr/local/go/src/runtime/netpoll.go:164 +0x44
net.(*pollDesc).wait(0x10b984fc, 0x72, 0x6e20e0, 0x6e0138)
	/usr/local/go/src/net/fd_poll_runtime.go:75 +0x28
net.(*pollDesc).waitRead(0x10b984fc, 0x10b57000, 0x1000)
	/usr/local/go/src/net/fd_poll_runtime.go:80 +0x24
net.(*netFD).Read(0x10b984c0, 0x10b57000, 0x1000, 0x1000, 0x0, 0x6e20e0, 0x6e0138)
	/usr/local/go/src/net/fd_unix.go:250 +0x148
net.(*conn).Read(0x10b648d0, 0x10b57000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/net.go:181 +0x58
net/http.(*persistConn).Read(0x10b5c460, 0x10b57000, 0x1000, 0x1000, 0x4, 0x67d2c, 0x10b63140)
	/usr/local/go/src/net/http/transport.go:1316 +0x174
bufio.(*Reader).fill(0x10ae0cf0)
	/usr/local/go/src/bufio/bufio.go:97 +0xf4
bufio.(*Reader).Peek(0x10ae0cf0, 0x1, 0x10bcada8, 0x1, 0x0, 0x10b98680, 0x0)
	/usr/local/go/src/bufio/bufio.go:129 +0x58
net/http.(*persistConn).readLoop(0x10b5c460)
	/usr/local/go/src/net/http/transport.go:1474 +0x160
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1117 +0x814

goroutine 106 [select]:
net/http.(*persistConn).writeLoop(0x10b5c460)
	/usr/local/go/src/net/http/transport.go:1704 +0x394
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1118 +0x830

goroutine 44 [select]:
main.refreshLocks(0x711a70, 0x10b0bf00)
	/tmp/restic-build-847077805/src/cmds/restic/lock.go:70 +0x218
created by main.lockRepository
	/tmp/restic-build-847077805/src/cmds/restic/lock.go:47 +0x294

goroutine 45 [semacquire]:
sync.runtime_Semacquire(0x10b689cc)
	/usr/local/go/src/runtime/sema.go:47 +0x24
sync.(*WaitGroup).Wait(0x10b689c0)
	/usr/local/go/src/sync/waitgroup.go:131 +0x94
restic/repository.FilesInParallel(0x76ee6d88, 0x10ae0150, 0x4c1338, 0x5, 0x14, 0x10b34820, 0x0, 0x0)
	/tmp/restic-build-847077805/src/restic/repository/parallel.go:66 +0x14c
restic/repository.(*Repository).LoadIndex.func2(0x10b62040, 0x10b0bfc0, 0x10adc180, 0x10b689b0)
	/tmp/restic-build-847077805/src/restic/repository/repository.go:336 +0x9c
created by restic/repository.(*Repository).LoadIndex
	/tmp/restic-build-847077805/src/restic/repository/repository.go:337 +0x100

goroutine 138 [IO wait]:
net.runtime_pollWait(0x76ee4100, 0x72, 0x12c06000)
	/usr/local/go/src/runtime/netpoll.go:164 +0x44
net.(*pollDesc).wait(0x10a13cfc, 0x72, 0x6e20e0, 0x6e0138)
	/usr/local/go/src/net/fd_poll_runtime.go:75 +0x28
net.(*pollDesc).waitRead(0x10a13cfc, 0x12c06000, 0x1000)
	/usr/local/go/src/net/fd_poll_runtime.go:80 +0x24
net.(*netFD).Read(0x10a13cc0, 0x12c06000, 0x1000, 0x1000, 0x0, 0x6e20e0, 0x6e0138)
	/usr/local/go/src/net/fd_unix.go:250 +0x148
net.(*conn).Read(0x10b34ae8, 0x12c06000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/net.go:181 +0x58
net/http.(*persistConn).Read(0x10b3e6e0, 0x12c06000, 0x1000, 0x1000, 0x4, 0x67d2c, 0x10b632c0)
	/usr/local/go/src/net/http/transport.go:1316 +0x174
bufio.(*Reader).fill(0x10a16e10)
	/usr/local/go/src/bufio/bufio.go:97 +0xf4
bufio.(*Reader).Peek(0x10a16e10, 0x1, 0x12c57da8, 0x1, 0x0, 0x10b63200, 0x0)
	/usr/local/go/src/bufio/bufio.go:129 +0x58
net/http.(*persistConn).readLoop(0x10b3e6e0)
	/usr/local/go/src/net/http/transport.go:1474 +0x160
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1117 +0x814

goroutine 139 [select]:
net/http.(*persistConn).writeLoop(0x10b3e6e0)
	/usr/local/go/src/net/http/transport.go:1704 +0x394
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1118 +0x830

And dmesg

May 27 01:32:44 rp1 kernel: [1779339.099132] apache2 invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
May 27 01:32:47 rp1 kernel: [1779339.099149] apache2 cpuset=/ mems_allowed=0
May 27 01:32:47 rp1 kernel: [1779339.099183] CPU: 2 PID: 776 Comm: apache2 Not tainted 4.9.24-v7+ #993
May 27 01:32:47 rp1 kernel: [1779339.099189] Hardware name: BCM2835
May 27 01:32:47 rp1 kernel: [1779339.099236] [<8010fb3c>] (unwind_backtrace) from [<8010c058>] (show_stack+0x20/0x24)
May 27 01:32:47 rp1 rsyslogd-2007: action 'action 17' suspended, next retry is Sat May 27 01:34:17 2017 [try http://www.rsyslog.com/e/2007 ]
May 27 01:32:50 rp1 kernel: [1779339.099259] [<8010c058>] (show_stack) from [<80455200>] (dump_stack+0xd4/0x118)
May 27 01:32:50 rp1 kernel: [1779339.099281] [<80455200>] (dump_stack) from [<8026c998>] (dump_header+0x9c/0x1f4)
May 27 01:32:50 rp1 kernel: [1779339.099306] [<8026c998>] (dump_header) from [<80210364>] (oom_kill_process+0x3e0/0x4e4)
May 27 01:32:50 rp1 kernel: [1779339.099327] [<80210364>] (oom_kill_process) from [<802107cc>] (out_of_memory+0x124/0x334)
May 27 01:32:50 rp1 kernel: [1779339.099348] [<802107cc>] (out_of_memory) from [<802158f0>] (__alloc_pages_nodemask+0xcf4/0xdd0)
May 27 01:32:50 rp1 kernel: [1779339.099370] [<802158f0>] (__alloc_pages_nodemask) from [<8020e694>] (filemap_fault+0x474/0x674)
May 27 01:32:50 rp1 kernel: [1779339.099390] [<8020e694>] (filemap_fault) from [<8030c74c>] (ext4_filemap_fault+0x3c/0x50)
May 27 01:32:50 rp1 kernel: [1779339.099412] [<8030c74c>] (ext4_filemap_fault) from [<8023d844>] (__do_fault+0x7c/0x100)
May 27 01:32:50 rp1 kernel: [1779339.099430] [<8023d844>] (__do_fault) from [<80241b70>] (handle_mm_fault+0x5fc/0xde4)
May 27 01:32:50 rp1 kernel: [1779339.099452] [<80241b70>] (handle_mm_fault) from [<807193f4>] (do_page_fault+0x33c/0x3b0)
May 27 01:32:50 rp1 kernel: [1779339.099471] [<807193f4>] (do_page_fault) from [<801012a8>] (do_PrefetchAbort+0x44/0xa8)
May 27 01:32:50 rp1 kernel: [1779339.099488] [<801012a8>] (do_PrefetchAbort) from [<80718ea4>] (ret_from_exception+0x0/0x1c)
May 27 01:32:50 rp1 kernel: [1779339.099497] Exception stack(0xb2a89fb0 to 0xb2a89ff8)
May 27 01:32:50 rp1 kernel: [1779339.099509] 9fa0:                                     000f4240 00000000 00000003 00011175
May 27 01:32:50 rp1 kernel: [1779339.099524] 9fc0: 7ea3aad4 7ea3aacc 7ea3aad0 00000000 00000004 76912080 76c44cb0 553e0f60
May 27 01:32:50 rp1 kernel: [1779339.099536] 9fe0: 76ec9274 7ea3aa30 54bcd908 54bc3f68 20000010 ffffffff
May 27 01:32:50 rp1 kernel: [1779339.099543] Mem-Info:
May 27 01:32:51 rp1 kernel: [1779339.099573] active_anon:103001 inactive_anon:102938 isolated_anon:32
May 27 01:32:51 rp1 kernel: [1779339.099573]  active_file:344 inactive_file:665 isolated_file:0
May 27 01:32:51 rp1 kernel: [1779339.099573]  unevictable:0 dirty:0 writeback:0 unstable:0
May 27 01:32:51 rp1 kernel: [1779339.099573]  slab_reclaimable:1659 slab_unreclaimable:4074
May 27 01:32:51 rp1 kernel: [1779339.099573]  mapped:1022 shmem:686 pagetables:772 bounce:0
May 27 01:32:51 rp1 kernel: [1779339.099573]  free:4085 free_pcp:357 free_cma:0
May 27 01:32:51 rp1 kernel: [1779339.099599] Node 0 active_anon:412004kB inactive_anon:411852kB active_file:1376kB inactive_file:2660kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4088kB dirty:0kB writeback:0kB shmem:2744kB writeback_tmp:0kB unstable:0kB pages_scanned:37542 all_unreclaimable? yes
May 27 01:32:51 rp1 kernel: [1779339.099630] Normal free:16340kB min:16384kB low:20480kB high:24576kB active_anon:412004kB inactive_anon:411752kB active_file:1376kB inactive_file:2660kB unevictable:0kB writepending:0kB present:901120kB managed:880564kB mlocked:0kB slab_reclaimable:6636kB slab_unreclaimable:16296kB kernel_stack:1840kB pagetables:3088kB bounce:0kB free_pcp:1428kB local_pcp:316kB free_cma:0kB
May 27 01:32:51 rp1 kernel: lowmem_reserve[]: 0 0
May 27 01:32:51 rp1 kernel: [1779339.099651] Normal: 129*4kB (UME) 86*8kB (E) 50*16kB (UME) 50*32kB (ME) 26*64kB (UME) 51*128kB (UME) 8*256kB (UME) 3*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 16404kB
May 27 01:32:51 rp1 kernel: 1747 total pagecache pages
May 27 01:32:51 rp1 kernel: [1779339.099758] 62 pages in swap cache
May 27 01:32:51 rp1 kernel: [1779339.099767] Swap cache stats: add 95904, delete 95842, find 35132/41154
May 27 01:32:51 rp1 kernel: [1779339.099772] Free swap  = 0kB
May 27 01:32:51 rp1 kernel: [1779339.099776] Total swap = 102396kB
May 27 01:32:51 rp1 kernel: [1779339.099782] 225280 pages RAM
May 27 01:32:51 rp1 kernel: [1779339.099787] 0 pages HighMem/MovableOnly
May 27 01:32:51 rp1 kernel: [1779339.099792] 5139 pages reserved
...

Last few msgs in the debug log:

2017/05/29 03:12:50 debug/round_tripper_debug.go:87     debug.loggingRoundTripper.RoundTrip     103     ------------  HTTP RESPONSE ----------
HTTP/1.1 206 Partial Content
Content-Length: 449555642
Accept-Ranges: bytes
Content-Range: bytes 0-449555641/449555642
Content-Type: application/octet-stream
Date: Mon, 29 May 2017 01:12:50 GMT


2017/05/29 03:12:50 repository/index.go:454     repository.DecodeIndex  111     Start decoding index
2017/05/29 03:12:50 repository/index.go:486     repository.DecodeIndex  111     done
2017/05/29 03:12:54 repository/index.go:454     repository.DecodeIndex  108     Start decoding index
2017/05/29 03:13:01 repository/index.go:486     repository.DecodeIndex  108     done

Steps to reproduce the behavior

Pick a RP3, run apache and domotics (both not small memory users) and try to make a backup and see it oom.

Do you have any idea what may have caused this?

Excessive sized mallocs? A 512MB alloc is rather steep… runtime: out of memory: cannot allocate 536870912-byte block (556793856 in use) I mean… it did not even start scanning, so likely it was reading indexes oid. Which then confuses me, because there should be no index of that size. <pauze> I take that back:

-rw-------    1 admin    administ  428.7M May 28 15:19 270faf1dc62ce9393f38599cea9b3997220bed216345d2b2673a5ba62425248d
-rw-------    1 admin    administ    6.1M May 29 00:33 bef6d99358953f642115c290b1dd312ced2e2f7b34650524ac14156ce2a036f7
-rw-------    1 admin    administ   48.7K May 29 00:34 e91b8b4186b086b8def47025cb88efc12a0ef04ef91f885e72d0c5af801b4724

and this was after a prune earlier today.

So… does this mean restic is impossible to use on systems with memory <= 1GB like the RP3? Or many VMs in the world? That seems like a show stopper for wider deployment.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 48 (40 by maintainers)

Most upvoted comments

Ah, I see. I didn’t realize that was a hard and fast rule about verifying the MAC first. I was thinking maybe it could be done as part of closing the stream or something. Sorry that I didn’t read that earlier comment more closely, and I apologize for dragging you down this rabbit hole with me!

Don’t worry about it 🙂 I get to see crypto break regularly during my day job as a pentester (sometimes I’m the one breaking it), so I’m very sensitive to these issues. On the other hand I know that most people don’t know all the details from the top of their heads (which is fine).

Is MAC pre-verification done in order to guard against a particular vulnerability (e.g. executable code during JSON decode), or is it more of a best practice simply to validate everything up front before it flows through the program?

Checking the MAC only after data has been processed enables vulnerabilities. For example, Padding Oracle vulnerabilities rely on the fact that processing is done before a MAC is checked (if there even is one). For these vulnerabilities, leaking one bit of information about the plaintext enables decrypting the data and even encrypting arbitrary other data, without knowing the encryption key. It’s really astonishing. In general, verifying the authenticity of the ciphertext before even attempting to decrypt the data is an established best practice.

For a while, there was a discussion if encrypt-then-mac (MAC is on the outside, so it can be checked before decryption) or mac-then-encrypt (MAC is encrypted, needs to be decrypted before it can be checked) is better. From my point of view, this has been settled and the former is clearly better. Here’s a question on Security Stackexchange: https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac

I’m also experiencing this issue using the b2 backend. Repository is right around 1.5TB total, index directory is 6.8GB (2274 files, largest one <30MB). restic rapidly takes up ~23.5GB memory, then it seems to stabilize.

Personally I think the drive for keeping the internal data in json format is a mistake. The data is encrypted, so not directly readable anyway and restic cat xxx can marshal to json and show that instead. Doing that would allow restic to use the more performant gob encoders or whatever we come up with.

Just to be clear, as I hinted in https://github.com/restic/restic/issues/979#issuecomment-343651588 above, we must verify the MAC before attempting to processing the contents in any way. Especially, we must not decrypt or even JSON decode data for which the MAC has not been verified yet.

I don’t see how we can do that while having a streaming architecture (and I’ve tried, look for DecryptReader in the git history). We can (and should) build a streaming architecture, but after decryption. That’s why I was so interested in the memory usage of the JSON decoder independent of the source of the data (e.g. the []byte).

I’ve had a quick look at the code in your branch and it looks like the code in there does exactly what we must not do: start decrypting data before verifying the MAC. That’s terribly insecure and violates established best practice in crypto. I won’t merge any code which does that.

My ideas for improving the situation are:

  • Change the code so that the objects are smaller: For indexes, this means having more than one index file (e.g. as produced by rebuild-index), which are then processed sequentially. Each index file can then be limited to < 100MiB.
  • Make the JSON decoder as efficient as possible so that it (ideally) doesn’t allocate/use any additional memory.

I really appreciate your effort!

Yeah, that code still generates a big []byte and then streams from it, so no surprises there.

Using the indexJsonStreamer is not going to help unless the entire workflow from backend -> json can be streamed. To that end, I experimented a bit with a LoadAndDecryptStream() method on repository. You can grab it here.

I had trouble with two things:

  1. The mac bytes are stored at the end of the file, and these bytes should not be fed to the decryption func. So we need a stream that “reserves” the last n bytes from the end of a stream. I wrote a “ReservedReadCloser”. I’m not entirely confident it covers every edgecase, but I wrote a quick unit test to cover the obvious ones.

  2. I don’t see a stream-compatible way to do poly1305.Verify(). Maybe I didn’t look hard enough? For the moment, my code just punts on this with a TODO.

It’s not exactly apples-to-apples, since poly1305 is not implemented, but here is a quick test with a 256MB index. Note that this does not involve JSON decoding at all; it just writes the data to stdout.

$ du -hs /Volumes/backup1/restic-repo/index/ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
256M	/Volumes/backup1/restic-repo/index/ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58


$ for i in {1..10}; do xtime ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58  > /dev/null ; done
1.77u 0.31s 8.74r 1088172kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.75u 0.31s 2.26r 1089420kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.74u 0.32s 2.24r 1089384kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.74u 0.31s 2.23r 1089284kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.76u 0.31s 2.26r 1089312kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.81u 0.37s 2.37r 1089368kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.77u 0.33s 2.27r 1090124kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.75u 0.32s 2.25r 1090104kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.77u 0.32s 2.26r 1089292kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.76u 0.32s 2.26r 1089316kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58

$ for i in {1..10}; do xtime ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream  > /dev/null ; done
1.41u 0.10s 1.72r 40468kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.11s 1.75r 40428kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40468kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.10s 1.72r 40320kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40480kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40504kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.10s 1.72r 40296kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40480kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.10s 1.72r 40268kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40416kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream

@mholt :

Not sure if helpful, but json.NewDecoder®.Decode(&thing) will decode the JSON by streaming it, rather than having to buffer the whole string in memory. I think.

@fd0 :

That’s unfortunately not applicable: It loads the whole (large) object into a buffer and then parses it. I’ve benchmarked that.

Hey, I did some experimentation and @mholt’s comment above seems to be correct. You do need to do a tiny bit of manual parsing (to navigate brackets and whatnot). But once you do that, you can call Decode(&thing) on just a small part of the JSON stream and not need to keep the entire JSON in memory all at once.

I wrote a stream-friendly version of LoadIndex and it performs way better on RAM usage than the current code which uses json.Unmarshal. Check the memory logging in TestLoadGiantIndexUnmarshal vs TestLoadGiantIndexStreaming.

Unfortunately this is not a drop-in replacement, because much of restic’s other Load* funcs deal in []byte rather than Reader. And I know you’re looking into a more-general solution (multi-level indices, or whatever). But maybe something to think about for the future.

That’s unfortunately not applicable: It loads the whole (large) object into a buffer and then parses it. I’ve benchmarked that.