restic: Bad throughput with high latency connections due to low TCP send buffer size

This is quite possibly related to #1383.

Output of restic version

restic 0.9.2 compiled with go1.10.3 on windows/amd64

How did you run restic exactly?

restic.exe -r b2🪣/repo --verbose -o b2.connections=10 backup ā€œD:\Eigene Dateienā€

What backend/server/service did you use to store the repository?

Backblaze b2, but this should affect all backends with high latency using the http_transport.

Expected behavior

restic should scan my folder and upload data to b2, trying its best to saturate my internet upload (~10MBit/s) since that should definitely be the bottleneck.

Actual behavior

restic scans my folder and uploads data to b2 at about 1.5MBit/s (~750KBit/s per actively used connection).

Steps to reproduce the behavior

Here are environment details that may be relevant: Windows 7 Professional, 64 bit Zyxel NWD6605 USB wireless networking adapter Location in Germany, Deutsche Telekom ISP

Do you have any idea what may have caused this?

First, I tried to see if the issue was with b2 or my internet connection instead of restic. But the python b2 command line client could upload a file and saturate my upload without an issue. Next I fired up Wireshark to see on which end the data flow was restricted. It turned out that

  • the bandwidth of each connection was quite steady
  • there was no significant amount of dropped packets / retries, so probably not limited by congestion control
  • the advertised window in the ACK packets coming back from b2 was sufficiently generous (around 300KB)
  • however, when sending there were always just around ~12kB of data in flight. Given the high latency to the b2 servers, restic spent most of the time waiting for data to be ACKed by b2, and then immediately sent out a burst of new packets, then proceeded to wait again for enough outstanding bytes to be ACKed.

I jumped to the conclusion that restic was probably using a small TCP send buffer, since this would limit the ammount of outstanding bytes that the TCP stack could keep track of.

Do you have an idea how to solve the issue?

To test the suspicion, I tried increasing the TCP send buffer size to 500KB (see https://github.com/Medo42/restic/commit/3c8bab2b42e638add86e139ec07d6cad12fc6bca) and it indeed made the situation a lot better, with the two active connections now taking most of my bandwidth. There is still quite a bit of fluctuation, but that could be due to other reasons.

Did restic help you or made you happy in any way?

Thank you so much for this project. I never saw a free/libre backup solution before that clicked with me. After using Crashplan for years in PC <-> PC mode I finally had to look for a new solution, and I was about ready to give my money to Backblaze when I saw restic mentioned on their blog. After watching the video of your talk at the C4 I was sure that this was the right tool for me. And I finally understand now how you can deduplicate data that’s shifting around in files šŸ˜„

Note that I’m not sending this as a pull request but just include a proof of concept because I’m not sure this is actually a good general solution, or that this is the right place to put the code. I also went at this with zero knowledge of go, so adding this workaround was an interesting puzzle plus some trial and error, so I’m not sure if I got everything right.

If restic without my workaround never sets an explicit send buffer size, I don’t really know how it is selected by the system, so this might just affect my version of Windows, or might be affected by some settings hidden behind arcane tools and registry keys.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 20 (2 by maintainers)

Most upvoted comments

Very slow performance from a FreeBSD host in Australia to b2 US.

[111:00:44] 26.16%  6828 files 53.315 GiB, total 26102 files 203.827 GiB, 0 errors ETA 313:23:49

I’ve been following this since June, hoping to see some movement… If anyone has patches they want tested, send them my way.

Hey, sorry for not coming back to you earlier. I’m just going through all the unlabeled issues in the restic repo now that I have a bit more spare time.

B2 is a bit peculiar, especially for non-US users: the latency from Europe to B2’s servers is very high, at least 800ms from ā€œHTTP headers sentā€ to ā€œHTTP response header receivedā€, so the per-file overhead is very high.

Most files uploaded to B2 are rather small, especially compared to high-bandwidth uplinks: by default, most pack files restic creates and uploads to the backend are ~4MiB in size. This will also limit throughput, as it’s much more efficient to upload a small number of large files than a large number of small files. Our plan is to adapt the file size dynamically to how much upstream bandwidth (and memory) is available at backup time, but we’re not there yet.

I’m wondering though, why didn’t the OS set a good TCP send buffer itself? Isn’t that automatically set? Sure, we can increase the send buffer size for the B2 backend, ideally so that most files will just fit into the buffer. Maybe 6 MiB is a good size?

Btw, in https://github.com/Medo42/restic/commit/3c8bab2b42e638add86e139ec07d6cad12fc6bca you’re setting 5MiB as the TCP send buffer size, not 500KiB. 😃

Before we can merge such a change, I’d love to see some benchmarks and comparisons, so it’d be great to build a small sample program that measures the effects which users can run on different operating systems.