restic: Bad throughput with high latency connections due to low TCP send buffer size
This is quite possibly related to #1383.
Output of restic version
restic 0.9.2 compiled with go1.10.3 on windows/amd64
How did you run restic exactly?
restic.exe -r b2šŖ£/repo --verbose -o b2.connections=10 backup āD:\Eigene Dateienā
What backend/server/service did you use to store the repository?
Backblaze b2, but this should affect all backends with high latency using the http_transport.
Expected behavior
restic should scan my folder and upload data to b2, trying its best to saturate my internet upload (~10MBit/s) since that should definitely be the bottleneck.
Actual behavior
restic scans my folder and uploads data to b2 at about 1.5MBit/s (~750KBit/s per actively used connection).
Steps to reproduce the behavior
Here are environment details that may be relevant: Windows 7 Professional, 64 bit Zyxel NWD6605 USB wireless networking adapter Location in Germany, Deutsche Telekom ISP
Do you have any idea what may have caused this?
First, I tried to see if the issue was with b2 or my internet connection instead of restic. But the python b2 command line client could upload a file and saturate my upload without an issue. Next I fired up Wireshark to see on which end the data flow was restricted. It turned out that
- the bandwidth of each connection was quite steady
- there was no significant amount of dropped packets / retries, so probably not limited by congestion control
- the advertised window in the ACK packets coming back from b2 was sufficiently generous (around 300KB)
- however, when sending there were always just around ~12kB of data in flight. Given the high latency to the b2 servers, restic spent most of the time waiting for data to be ACKed by b2, and then immediately sent out a burst of new packets, then proceeded to wait again for enough outstanding bytes to be ACKed.
I jumped to the conclusion that restic was probably using a small TCP send buffer, since this would limit the ammount of outstanding bytes that the TCP stack could keep track of.
Do you have an idea how to solve the issue?
To test the suspicion, I tried increasing the TCP send buffer size to 500KB (see https://github.com/Medo42/restic/commit/3c8bab2b42e638add86e139ec07d6cad12fc6bca) and it indeed made the situation a lot better, with the two active connections now taking most of my bandwidth. There is still quite a bit of fluctuation, but that could be due to other reasons.
Did restic help you or made you happy in any way?
Thank you so much for this project. I never saw a free/libre backup solution before that clicked with me. After using Crashplan for years in PC <-> PC mode I finally had to look for a new solution, and I was about ready to give my money to Backblaze when I saw restic mentioned on their blog. After watching the video of your talk at the C4 I was sure that this was the right tool for me. And I finally understand now how you can deduplicate data thatās shifting around in files š
Note that Iām not sending this as a pull request but just include a proof of concept because Iām not sure this is actually a good general solution, or that this is the right place to put the code. I also went at this with zero knowledge of go, so adding this workaround was an interesting puzzle plus some trial and error, so Iām not sure if I got everything right.
If restic without my workaround never sets an explicit send buffer size, I donāt really know how it is selected by the system, so this might just affect my version of Windows, or might be affected by some settings hidden behind arcane tools and registry keys.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 20 (2 by maintainers)
Very slow performance from a FreeBSD host in Australia to b2 US.
Iāve been following this since June, hoping to see some movement⦠If anyone has patches they want tested, send them my way.
Hey, sorry for not coming back to you earlier. Iām just going through all the unlabeled issues in the restic repo now that I have a bit more spare time.
B2 is a bit peculiar, especially for non-US users: the latency from Europe to B2ās servers is very high, at least 800ms from āHTTP headers sentā to āHTTP response header receivedā, so the per-file overhead is very high.
Most files uploaded to B2 are rather small, especially compared to high-bandwidth uplinks: by default, most pack files restic creates and uploads to the backend are ~4MiB in size. This will also limit throughput, as itās much more efficient to upload a small number of large files than a large number of small files. Our plan is to adapt the file size dynamically to how much upstream bandwidth (and memory) is available at backup time, but weāre not there yet.
Iām wondering though, why didnāt the OS set a good TCP send buffer itself? Isnāt that automatically set? Sure, we can increase the send buffer size for the B2 backend, ideally so that most files will just fit into the buffer. Maybe 6 MiB is a good size?
Btw, in https://github.com/Medo42/restic/commit/3c8bab2b42e638add86e139ec07d6cad12fc6bca youāre setting 5MiB as the TCP send buffer size, not 500KiB. š
Before we can merge such a change, Iād love to see some benchmarks and comparisons, so itād be great to build a small sample program that measures the effects which users can run on different operating systems.