gvisor: network perf regression with RACK when shaping traffic
Description
We have a deployment of gvisor where traffic egress throughput is limited using iptables rules on the host that drop out-bound packets until the container has a budget for transmission. The overall throughput in our testing dropped significantly and we have bisected this to gvisor PR #6334 (Enable RACK by default in netstack) which changed gvisor’s built-in TCP stack to always enable “Recent Acknowledgement” (RACK). This change first appeared in release-20210726.
It’s not clear the root cause in gvisor’s RACK implementation (or our iptables rules) to explain what is happening with our form of egress throughput control.
The implementation depends on the transport being enabled for Selective Acknowledgment; disabling tcp_sack (sysctl net.ipv4.tcp_sack=0
) is an effective workaround but it is a blunt tool. Preferably we could get to root cause and address it; possibly with a config option to disable RACK in the interim so we don’t loose the benefits of tcp_sack.
The associated iptables rules look like this:
# This limit applies per-pod to traffic egressing to the internet.
# Each pod starts with a 600Mbit burst (75MB). Once the burst is consumed traffic is
# limited to 200Mbit (190mbit/s or 23750kbyte/s base + 10mbit/s recharge of the
# burst). If no packets are seen for 60s, the burst buffer should be fully recharged
# and the entry is expired since this is equivalent to the uninitialized state.
iptables -A "${CHAIN_NAME}" -o eth+ \
--match hashlimit \
--hashlimit-mode srcip \
--hashlimit-above 23750kb/s \
--hashlimit-name public_egress_rate_limit \
--hashlimit-burst 75m \
--hashlimit-htable-expire 60000 \
--jump DROP
# This limits (125mbyte/s = 1Gbit) applies per-pod to the sum of all traffic
# egressing to the ingress gateway, other services, and the public internet.
# We don't offer a burst because this already approaches the performance
# limits (2gbit/s egress) of the host.
iptables -A "${CHAIN_NAME}" \
--match hashlimit \
--hashlimit-mode srcip \
--hashlimit-above 125mb/s \
--hashlimit-name internal_egress_rate_limit \
--jump DROP
The performance with release-20210720 is steady and averages 196Mb/s but with release-20210726 with RACK enabled and our iptables rules in place on the host the performance running iperf from a container to an external server is uneven and averages ~15-20Mb/s:
Connecting to host ..., port 5201
[ 5] local ... port 30984 connected to ... port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 37.0 MBytes 311 Mbits/sec 0 0.00 Bytes
[ 5] 1.00-2.00 sec 16.8 MBytes 141 Mbits/sec 0 0.00 Bytes
[ 5] 2.00-3.00 sec 263 KBytes 2.15 Mbits/sec 0 0.00 Bytes
[ 5] 3.00-4.00 sec 15.5 KBytes 127 Kbits/sec 0 0.00 Bytes
[ 5] 4.00-5.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 5.00-6.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 6.00-7.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 7.00-8.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 8.00-9.00 sec 12.6 KBytes 104 Kbits/sec 0 0.00 Bytes
[ 5] 9.00-10.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 10.00-11.00 sec 13.9 KBytes 113 Kbits/sec 0 0.00 Bytes
[ 5] 11.00-12.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 12.00-13.00 sec 9.98 KBytes 81.8 Kbits/sec 0 0.00 Bytes
[ 5] 13.00-14.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 14.00-15.00 sec 13.9 KBytes 113 Kbits/sec 0 0.00 Bytes
[ 5] 15.00-16.00 sec 11.1 KBytes 91.0 Kbits/sec 0 0.00 Bytes
[ 5] 16.00-17.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 17.00-18.00 sec 12.6 KBytes 103 Kbits/sec 0 0.00 Bytes
[ 5] 18.00-19.00 sec 11.1 KBytes 90.9 Kbits/sec 0 0.00 Bytes
[ 5] 19.00-20.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 20.00-21.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 21.00-22.00 sec 32.8 MBytes 275 Mbits/sec 0 0.00 Bytes
[ 5] 22.00-23.00 sec 4.16 KBytes 34.1 Kbits/sec 0 0.00 Bytes
[ 5] 23.00-24.00 sec 12.5 KBytes 102 Kbits/sec 0 0.00 Bytes
[ 5] 24.00-25.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 25.00-26.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 26.00-27.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 27.00-28.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 28.00-29.00 sec 14.3 KBytes 117 Kbits/sec 0 0.00 Bytes
[ 5] 29.00-30.00 sec 32.3 MBytes 271 Mbits/sec 0 0.00 Bytes
[ 5] 30.00-31.00 sec 5.55 KBytes 45.4 Kbits/sec 0 0.00 Bytes
[ 5] 31.00-32.00 sec 16.9 KBytes 138 Kbits/sec 0 0.00 Bytes
[ 5] 32.00-33.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 33.00-34.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 34.00-35.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 35.00-36.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 36.00-37.00 sec 18.0 KBytes 148 Kbits/sec 0 0.00 Bytes
[ 5] 37.00-38.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 38.00-39.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 39.00-40.00 sec 103 KBytes 844 Kbits/sec 0 0.00 Bytes
[ 5] 40.00-41.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 41.00-42.00 sec 11.1 KBytes 90.8 Kbits/sec 0 0.00 Bytes
[ 5] 42.00-43.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 43.00-44.00 sec 19.8 KBytes 162 Kbits/sec 0 0.00 Bytes
[ 5] 44.00-45.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 45.00-46.00 sec 11.1 KBytes 90.9 Kbits/sec 0 0.00 Bytes
[ 5] 46.00-47.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 47.00-48.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 48.00-49.00 sec 12.5 KBytes 102 Kbits/sec 0 0.00 Bytes
[ 5] 49.00-50.00 sec 12.5 KBytes 102 Kbits/sec 0 0.00 Bytes
[ 5] 50.00-51.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 51.00-52.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 52.00-53.00 sec 33.7 MBytes 283 Mbits/sec 0 0.00 Bytes
[ 5] 53.00-54.00 sec 93.3 KBytes 765 Kbits/sec 0 0.00 Bytes
[ 5] 54.00-55.00 sec 35.6 MBytes 299 Mbits/sec 0 0.00 Bytes
[ 5] 55.00-56.00 sec 762 KBytes 6.24 Mbits/sec 0 0.00 Bytes
[ 5] 56.00-57.00 sec 296 KBytes 2.43 Mbits/sec 0 0.00 Bytes
[ 5] 57.00-58.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 58.00-59.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 59.00-60.00 sec 12.9 KBytes 106 Kbits/sec 0 0.00 Bytes
[ 5] 60.00-61.00 sec 31.8 MBytes 267 Mbits/sec 0 0.00 Bytes
[ 5] 61.00-62.00 sec 825 KBytes 6.76 Mbits/sec 0 0.00 Bytes
[ 5] 62.00-63.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 63.00-64.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 64.00-65.00 sec 11.5 KBytes 94.3 Kbits/sec 0 0.00 Bytes
[ 5] 65.00-66.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 66.00-67.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 67.00-68.00 sec 31.4 MBytes 264 Mbits/sec 0 0.00 Bytes
[ 5] 68.00-69.00 sec 592 KBytes 4.85 Mbits/sec 0 0.00 Bytes
[ 5] 69.00-70.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 70.00-71.00 sec 22.6 KBytes 185 Kbits/sec 0 0.00 Bytes
[ 5] 71.00-72.00 sec 18.0 KBytes 148 Kbits/sec 0 0.00 Bytes
[ 5] 72.00-73.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 73.00-74.00 sec 11.1 KBytes 90.9 Kbits/sec 0 0.00 Bytes
[ 5] 74.00-75.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 75.00-76.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 76.00-77.00 sec 11.1 KBytes 90.9 Kbits/sec 0 0.00 Bytes
[ 5] 77.00-78.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 78.00-79.00 sec 12.5 KBytes 102 Kbits/sec 0 0.00 Bytes
[ 5] 79.00-80.00 sec 12.5 KBytes 102 Kbits/sec 0 0.00 Bytes
[ 5] 80.00-81.00 sec 12.9 KBytes 106 Kbits/sec 0 0.00 Bytes
[ 5] 81.00-82.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 82.00-83.00 sec 11.1 KBytes 91.0 Kbits/sec 0 0.00 Bytes
[ 5] 83.00-84.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 84.00-85.00 sec 17.6 MBytes 148 Mbits/sec 0 0.00 Bytes
[ 5] 85.00-86.00 sec 15.6 MBytes 131 Mbits/sec 0 0.00 Bytes
[ 5] 86.00-87.00 sec 295 KBytes 2.42 Mbits/sec 0 0.00 Bytes
[ 5] 87.00-88.00 sec 13.9 KBytes 114 Kbits/sec 0 0.00 Bytes
[ 5] 88.00-89.00 sec 24.7 MBytes 207 Mbits/sec 0 0.00 Bytes
[ 5] 89.00-90.00 sec 5.38 MBytes 45.1 Mbits/sec 0 0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-90.00 sec 319 MBytes 29.7 Mbits/sec 0 sender
[ 5] 0.00-90.00 sec 316 MBytes 29.4 Mbits/sec receiver
Steps to reproduce
Add ipchains rules based on those in the description to the host where the iperf test is run today. Observe the throughput with RACK enabled.
runsc version
release-20210726 was the first version impacted
docker version (if using docker)
No response
uname
4.19.0-17 kernel from Debian 10 Buster
kubectl (if using Kubernetes)
1.21.11
repo state (if built from source)
Not built from source
runsc debug logs (if available)
None available
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (8 by maintainers)
Thanks for confirming. I will work with Nayana to see what is going haywire with our RACK implementation. I will keep you posted once we find something.
Thanks I will take a look and see if I can figure out what’s going on.