gvisor: network perf regression with RACK when shaping traffic

Description

We have a deployment of gvisor where traffic egress throughput is limited using iptables rules on the host that drop out-bound packets until the container has a budget for transmission. The overall throughput in our testing dropped significantly and we have bisected this to gvisor PR #6334 (Enable RACK by default in netstack) which changed gvisor’s built-in TCP stack to always enable “Recent Acknowledgement” (RACK). This change first appeared in release-20210726.

It’s not clear the root cause in gvisor’s RACK implementation (or our iptables rules) to explain what is happening with our form of egress throughput control.

The implementation depends on the transport being enabled for Selective Acknowledgment; disabling tcp_sack (sysctl net.ipv4.tcp_sack=0) is an effective workaround but it is a blunt tool. Preferably we could get to root cause and address it; possibly with a config option to disable RACK in the interim so we don’t loose the benefits of tcp_sack.

The associated iptables rules look like this:

# This limit applies per-pod to traffic egressing to the internet.
# Each pod starts with a 600Mbit burst (75MB). Once the burst is consumed traffic is
# limited to 200Mbit (190mbit/s or 23750kbyte/s base + 10mbit/s recharge of the
# burst). If no packets are seen for 60s, the burst buffer should be fully recharged
# and the entry is expired since this is equivalent to the uninitialized state.
iptables -A "${CHAIN_NAME}" -o eth+ \
  --match hashlimit \
  --hashlimit-mode srcip \
  --hashlimit-above 23750kb/s \
  --hashlimit-name public_egress_rate_limit \
  --hashlimit-burst 75m \
  --hashlimit-htable-expire 60000 \
  --jump DROP
# This limits (125mbyte/s = 1Gbit) applies per-pod to the sum of all traffic
# egressing to the ingress gateway, other services, and the public internet.
# We don't offer a burst because this already approaches the performance
# limits (2gbit/s egress) of the host.
iptables -A "${CHAIN_NAME}" \
  --match hashlimit \
  --hashlimit-mode srcip \
  --hashlimit-above 125mb/s \
  --hashlimit-name internal_egress_rate_limit \
  --jump DROP

The performance with release-20210720 is steady and averages 196Mb/s but with release-20210726 with RACK enabled and our iptables rules in place on the host the performance running iperf from a container to an external server is uneven and averages ~15-20Mb/s:

Connecting to host ..., port 5201
[  5] local ... port 30984 connected to ... port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  37.0 MBytes   311 Mbits/sec    0   0.00 Bytes       
[  5]   1.00-2.00   sec  16.8 MBytes   141 Mbits/sec    0   0.00 Bytes       
[  5]   2.00-3.00   sec   263 KBytes  2.15 Mbits/sec    0   0.00 Bytes       
[  5]   3.00-4.00   sec  15.5 KBytes   127 Kbits/sec    0   0.00 Bytes       
[  5]   4.00-5.00   sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]   5.00-6.00   sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]   6.00-7.00   sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]   7.00-8.00   sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]   8.00-9.00   sec  12.6 KBytes   104 Kbits/sec    0   0.00 Bytes       
[  5]   9.00-10.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  10.00-11.00  sec  13.9 KBytes   113 Kbits/sec    0   0.00 Bytes       
[  5]  11.00-12.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  12.00-13.00  sec  9.98 KBytes  81.8 Kbits/sec    0   0.00 Bytes       
[  5]  13.00-14.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  14.00-15.00  sec  13.9 KBytes   113 Kbits/sec    0   0.00 Bytes       
[  5]  15.00-16.00  sec  11.1 KBytes  91.0 Kbits/sec    0   0.00 Bytes       
[  5]  16.00-17.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  17.00-18.00  sec  12.6 KBytes   103 Kbits/sec    0   0.00 Bytes       
[  5]  18.00-19.00  sec  11.1 KBytes  90.9 Kbits/sec    0   0.00 Bytes       
[  5]  19.00-20.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  20.00-21.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  21.00-22.00  sec  32.8 MBytes   275 Mbits/sec    0   0.00 Bytes       
[  5]  22.00-23.00  sec  4.16 KBytes  34.1 Kbits/sec    0   0.00 Bytes       
[  5]  23.00-24.00  sec  12.5 KBytes   102 Kbits/sec    0   0.00 Bytes       
[  5]  24.00-25.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  25.00-26.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  26.00-27.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  27.00-28.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  28.00-29.00  sec  14.3 KBytes   117 Kbits/sec    0   0.00 Bytes       
[  5]  29.00-30.00  sec  32.3 MBytes   271 Mbits/sec    0   0.00 Bytes       
[  5]  30.00-31.00  sec  5.55 KBytes  45.4 Kbits/sec    0   0.00 Bytes       
[  5]  31.00-32.00  sec  16.9 KBytes   138 Kbits/sec    0   0.00 Bytes       
[  5]  32.00-33.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  33.00-34.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  34.00-35.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  35.00-36.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  36.00-37.00  sec  18.0 KBytes   148 Kbits/sec    0   0.00 Bytes       
[  5]  37.00-38.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  38.00-39.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  39.00-40.00  sec   103 KBytes   844 Kbits/sec    0   0.00 Bytes       
[  5]  40.00-41.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  41.00-42.00  sec  11.1 KBytes  90.8 Kbits/sec    0   0.00 Bytes       
[  5]  42.00-43.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  43.00-44.00  sec  19.8 KBytes   162 Kbits/sec    0   0.00 Bytes       
[  5]  44.00-45.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  45.00-46.00  sec  11.1 KBytes  90.9 Kbits/sec    0   0.00 Bytes       
[  5]  46.00-47.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  47.00-48.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  48.00-49.00  sec  12.5 KBytes   102 Kbits/sec    0   0.00 Bytes       
[  5]  49.00-50.00  sec  12.5 KBytes   102 Kbits/sec    0   0.00 Bytes       
[  5]  50.00-51.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  51.00-52.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  52.00-53.00  sec  33.7 MBytes   283 Mbits/sec    0   0.00 Bytes       
[  5]  53.00-54.00  sec  93.3 KBytes   765 Kbits/sec    0   0.00 Bytes       
[  5]  54.00-55.00  sec  35.6 MBytes   299 Mbits/sec    0   0.00 Bytes       
[  5]  55.00-56.00  sec   762 KBytes  6.24 Mbits/sec    0   0.00 Bytes       
[  5]  56.00-57.00  sec   296 KBytes  2.43 Mbits/sec    0   0.00 Bytes       
[  5]  57.00-58.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  58.00-59.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  59.00-60.00  sec  12.9 KBytes   106 Kbits/sec    0   0.00 Bytes       
[  5]  60.00-61.00  sec  31.8 MBytes   267 Mbits/sec    0   0.00 Bytes       
[  5]  61.00-62.00  sec   825 KBytes  6.76 Mbits/sec    0   0.00 Bytes       
[  5]  62.00-63.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  63.00-64.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  64.00-65.00  sec  11.5 KBytes  94.3 Kbits/sec    0   0.00 Bytes       
[  5]  65.00-66.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  66.00-67.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  67.00-68.00  sec  31.4 MBytes   264 Mbits/sec    0   0.00 Bytes       
[  5]  68.00-69.00  sec   592 KBytes  4.85 Mbits/sec    0   0.00 Bytes       
[  5]  69.00-70.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  70.00-71.00  sec  22.6 KBytes   185 Kbits/sec    0   0.00 Bytes       
[  5]  71.00-72.00  sec  18.0 KBytes   148 Kbits/sec    0   0.00 Bytes       
[  5]  72.00-73.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  73.00-74.00  sec  11.1 KBytes  90.9 Kbits/sec    0   0.00 Bytes       
[  5]  74.00-75.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  75.00-76.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  76.00-77.00  sec  11.1 KBytes  90.9 Kbits/sec    0   0.00 Bytes       
[  5]  77.00-78.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  78.00-79.00  sec  12.5 KBytes   102 Kbits/sec    0   0.00 Bytes       
[  5]  79.00-80.00  sec  12.5 KBytes   102 Kbits/sec    0   0.00 Bytes       
[  5]  80.00-81.00  sec  12.9 KBytes   106 Kbits/sec    0   0.00 Bytes       
[  5]  81.00-82.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  82.00-83.00  sec  11.1 KBytes  91.0 Kbits/sec    0   0.00 Bytes       
[  5]  83.00-84.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  84.00-85.00  sec  17.6 MBytes   148 Mbits/sec    0   0.00 Bytes       
[  5]  85.00-86.00  sec  15.6 MBytes   131 Mbits/sec    0   0.00 Bytes       
[  5]  86.00-87.00  sec   295 KBytes  2.42 Mbits/sec    0   0.00 Bytes       
[  5]  87.00-88.00  sec  13.9 KBytes   114 Kbits/sec    0   0.00 Bytes       
[  5]  88.00-89.00  sec  24.7 MBytes   207 Mbits/sec    0   0.00 Bytes       
[  5]  89.00-90.00  sec  5.38 MBytes  45.1 Mbits/sec    0   0.00 Bytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-90.00  sec   319 MBytes  29.7 Mbits/sec    0             sender
[  5]   0.00-90.00  sec   316 MBytes  29.4 Mbits/sec                  receiver

Steps to reproduce

Add ipchains rules based on those in the description to the host where the iperf test is run today. Observe the throughput with RACK enabled.

runsc version

release-20210726 was the first version impacted

docker version (if using docker)

No response

uname

4.19.0-17 kernel from Debian 10 Buster

kubectl (if using Kubernetes)

1.21.11

repo state (if built from source)

Not built from source

runsc debug logs (if available)

None available

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Thanks for confirming. I will work with Nayana to see what is going haywire with our RACK implementation. I will keep you posted once we find something.

Thanks I will take a look and see if I can figure out what’s going on.