kubernetes: Network throughput degrading on Debian buster

What happened: We deployed an 8-node kuberentes cluster with kubespray on debian buster nodes. After deploying an application, the business started complanining of poor performance and some nodes were having timeouts despite having good computing speeds.

After realizing the issue was related to poor network performance, we started running iperf’s between all the machines periodically and noticed that the network performance was degrading day after day.

As such, we added 2 more nodes to the k8s cluster, this time running CentOS 7, and added a debian buster machine on the side without k8s.

The network performance kept stable on the CentOS’s and the debian machine w/o k8s but keeps degrading on the other nodes, including the masters.

What you expected to happen: Networking performance to be stable

How to reproduce it (as minimally and precisely as possible): Deploy K8S using kubespray and flannel on debian buster. Leave it be for a couple of days with a test application that works 2-3 times/week on a high workload and see if the performance degrades.

Anything else we need to know?: We’re running all the nodes on a VMWare environment with vmxnet3 (10Gbps) network cards. We also created issue #6706 on kubespray (6706)

Environment:

  • Kubernetes version (use kubectl version):
% kubectl version
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.8", GitCommit:"ec6eb119b81be488b030e849b9e64fda4caaf33c", GitTreeState:"clean", BuildDate:"2020-03-12T20:52:22Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: VMware virtual machines with 16GB ram each and 8 vCPU’s.

  • OS (e.g: cat /etc/os-release):

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a): Debian Buster: Linux node1 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux

CentOS: Linux node9 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools: Kubespray

  • Network plugin and version (if this is a network-related bug): Flannel 0.3.1

  • Others: Here are the prometheus graphs showing the network speed dropping over time: 93460258-d8147600-f8da-11ea-8ec0-3f4c6a699434 “debian-sem-k8s” means “debian-without-k8s”

The 11th (debian buster w/o k8s) machine is not part of the cluster. It’s just a machine we’re using as a “control” node (without anything installed).

kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION               CONTAINER-RUNTIME
node1    Ready    master   149d    v1.16.8   192.168.197.3    <none>        Debian GNU/Linux 10 (buster)   4.19.0-8-amd64               docker://18.9.7
node10   Ready    <none>   6d16h   v1.16.8   192.168.197.22   <none>        CentOS Linux 7 (Core)          3.10.0-957.21.3.el7.x86_64   docker://18.9.7
node2    Ready    master   149d    v1.16.8   192.168.197.4    <none>        Debian GNU/Linux 10 (buster)   4.19.0-8-amd64               docker://18.9.7
node3    Ready    master   149d    v1.16.8   192.168.197.5    <none>        Debian GNU/Linux 10 (buster)   4.19.0-8-amd64               docker://18.9.7
node4    Ready    <none>   149d    v1.16.8   192.168.197.6    <none>        Debian GNU/Linux 10 (buster)   4.19.0-10-amd64              docker://18.9.7
node5    Ready    <none>   149d    v1.16.8   192.168.197.7    <none>        Debian GNU/Linux 10 (buster)   4.19.0-10-amd64              docker://18.9.7
node6    Ready    <none>   149d    v1.16.8   192.168.197.8    <none>        Debian GNU/Linux 10 (buster)   4.19.0-10-amd64              docker://18.9.7
node7    Ready    <none>   149d    v1.16.8   192.168.197.9    <none>        Debian GNU/Linux 10 (buster)   4.19.0-8-amd64               docker://18.9.7
node8    Ready    <none>   149d    v1.16.8   192.168.197.10   <none>        Debian GNU/Linux 10 (buster)   4.19.0-8-amd64               docker://18.9.7
node9    Ready    <none>   6d16h   v1.16.8   192.168.197.21   <none>        CentOS Linux 7 (Core)          3.10.0-957.21.3.el7.x86_64   docker://18.9.7```

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

Right, let me know and I can also start digging here 😃