kubernetes: kubeproxy ipvs updates are slow and seems to be affected by node load.

What happened: Kube-proxy using ipvs takes a very long time to reflect service endpoint changes.

What you expected to happen: As fast as possible update on services.

How to reproduce it (as minimally and precisely as possible): Kube-proxy setup with ± 4000 services endpoints plus ipvs.

Anything else we need to know?:

Our cluster is 1.12.3, but we tried using a mixed version kube-proxy from several versions to see if any delivers different results to no avail (we tried 1.10.7, 1.12.5, 1.13.2).

We have a pretty unusual setup. Our workers are huge bare metal machines with 800GB and 48CPUs so we amped maxpods on nodes to suit this hardware.

Entire cluster has 26 machines:

  • 6 big nodes that fit previous description and
  • 10 VM nodes similar to GCE n1-standard-16
  • 6 VM smaller nodes similar to GCE n1-standard-4
  • 2 VMs acting as master (n1-standard-8).
  • nginx as loadbalancer to masters. –

Overall, I find kube-proxy to be too slow updating IPVS service endpoint on all nodes (average of 20s).

But there is this one node, the one with the most pods, that has disastrous updating performance.

I’ve made a shell script to monitor the time between updates of a newly-updated service. The amount between the [] is the number of seconds that it took to update.

node45 [146s]:TCP 10.36.247.8:27017 rr -> 10.79.124.86:27017 Masq 1 0 0
node46 [14s]:TCP 10.36.247.8:27017 rr -> 10.79.124.86:27017 Masq 1 0 0
node47 [16s]:TCP 10.36.247.8:27017 rr -> 10.79.124.86:27017 Masq 1 0 0
node48 [28s]:TCP 10.36.247.8:27017 rr -> 10.79.124.86:27017 Masq 1 0 0
node49 [14s]:TCP 10.36.247.8:27017 rr -> 10.79.124.86:27017 Masq 1 0 0
node50 [12s]:TCP 10.36.247.8:27017 rr -> 10.79.124.86:27017 Masq 1 0 0
node69 [16s]:TCP 10.36.247.8:27017 rr -> 10.79.124.86:27017 Masq 1 0 0

While most nodes were done around 15s, one of them took 28 and the worst one 146.

Unfortunately for me, this is during a low activity time. Tomorrow, when it starts getting real busy, time explodes up, and the slowest node can take up to 30 min (!) to update.

It’s not a hardware problem, because previously this behaviour happened on node49. We drained that node and now it has no pods; even then, it took 14s to update the service, but it’s no longer a killing time.

I get no suggestive errors on kube-proxy logs or api-server; no weird HTTP codes or denied requests.

We tried to tune qps and burst values to the ones showed below, but it did nothing to improve our condition.

apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
  acceptContentTypes: ""
  burst: 500
  contentType: application/vnd.kubernetes.protobuf
  kubeconfig: /etc/kubernetes/node-kubeconfig.yaml
  qps: 200
clusterCIDR: ""
cleanupIPVS: false
configSyncPeriod: 15m0s
conntrack:
  max: 0
  maxPerCore: 32768
  min: 131072
  tcpCloseWaitTimeout: 15m0s
  tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
ipvs:
  minSyncPeriod: 0s
  scheduler: ""
  syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0:10249
mode: ipvs
nodePortAddresses: ["192.168.41.45/32"]
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy

These are the number of pods running on each system:

node45: 337
node46: 270
node47: 301
node48: 271
node49: 3
node50: 283
node69: 3

These are the amount of ipvs rules on each node:

node45:10518
node46:12228
node47:12233
node48:12237
node49:10515
node50:12232
node69:10514

If there is any interest, I can provide /metrics from kube-proxy, apiserver or whatever else is needed.

Any suggestions on where to look are also welcome since I exhausted all my tests.

Environment:

  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:“1”, Minor:“12”, GitVersion:“v1.12.3”, GitCommit:“435f92c719f279a3a67808c80521ea17d5715c66”, GitTreeState:“clean”, BuildDate:“2018-11-26T12:46:57Z”, GoVersion:“go1.10.4”, Compiler:“gc”, Platform:“linux/amd64”}
  • Cloud provider or hardware configuration: hardware selfhosted.
  • OS (e.g. from /etc/os-release): CoreOS VERSION_ID=1800.6.0
  • Kernel (e.g. uname -a): Linux spessrvbpkn00045.estaleiro.serpro 4.14.59-coreos-r2 #1 SMP Sat Aug 4 02:49:25 UTC 2018 x86_64 Intel® Xeon® Gold 6126 CPU @ 2.60GHz GenuineIntel GNU/Linux
  • Install tools: systemd/docker units.
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 18 (14 by maintainers)

Most upvoted comments

Hey guys!

Original bug poster here, I’ve come here just to give a warm thank you to all involved with this issue.

I wanted to post the “benchmarks” of the several improvements made to kube-proxy on my setup.

My setup has around 20k lines of ipvsadm -Ln. I have a machine on my cluster running exactly 0 workloads - no pod workload, basically nothing other than kubernetes/calico using CPU.

These are the timings:

Old 1.12.2 kube-proxy: 221s to get from 0 to 19439 lines of ipvsadm; New 1.16.2 kube-proxy: 4s to get from 0 to 19438 lines of ipvsadm.

Thank you very much!