kubernetes: performance regression on ARM: 1.20 apiserver has high cpu usage

What happened: Cluster was upgraded from 1.19.5 to 1.20.1 using kubeadm. After that, kube-apiserver cpu usage went from 50% to 300%.

What you expected to happen: Expected that apiserver has the same cpu usage.

How to reproduce it (as minimally and precisely as possible): Update a cluster from 1.19.5 to 1.20.1.

Anything else we need to know?: The cluster is running on sbc hardware, thats why the original 50% cpu usage, and thats why 300% cpu usage is noticeable and causes service disruption.

Environment:

  • Kubernetes version (use kubectl version): (the apiserver is rolled back to 1.19.5)
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:10:32Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/arm"}
  • Cloud provider or hardware configuration: bare metal on SBC hardware.
  • OS (e.g: cat /etc/os-release):
# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel (e.g. uname -a):
# uname -a
Linux k8s-master01 5.4.86 #1 SMP Wed Dec 30 17:58:36 UTC 2020 armv7l GNU/Linux
  • Install tools: kubeadm
  • Network plugin and version (if this is a network-related bug): kube-router
  • Others: Please check profiling of each apiserver. apiserver.tar.gz

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 86 (48 by maintainers)

Commits related to this issue

Most upvoted comments

BTW, I updated #97753 to always log an info message explaining what’s up with the warnings.

@neolit123 Unfortunately, I think I got confused regarding compilers. Now it seems that I got the same results with go1.15.2, 1.15.5, 1.15.6. So, now I think it does not matter which toolchain I use.

I’ve added a little debugging:

diff --git a/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample_and_watermark.go b/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample_and_watermark.go
index aade70093a7..2073d33bcd8 100644
--- a/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample_and_watermark.go
+++ b/staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample_and_watermark.go
@@ -17,6 +17,7 @@ limitations under the License.
 package metrics
 
 import (
+       "fmt"
        "sync"
        "time"
 
@@ -201,6 +202,9 @@ func (saw *sampleAndWaterMarkHistograms) innerSet(updateXOrX1 func()) {
                whenS := when.String()
                klog.Errorf("Time went backwards from %s to %s for labelValues=%#+v", lastSetS, whenS, saw.labelValues)
        }
+
+       fmt.Println("innerSet ", saw.labelValues, " lastSI=", acc.lastSetInt, "when= ", when, " whenI=", whenInt)
+
        for acc.lastSetInt < whenInt {
                saw.samples.WithLabelValues(saw.labelValues...).Observe(acc.relX)
                saw.waterMarks.WithLabelValues(saw.loLabelValues...).Observe(acc.loRelX)

And I got strange results, a snippet from stdout:

innerSet  [executing global-default]  lastSI= 69839 when=  2021-01-05 17:57:31.001731921 +0000 UTC m=+70.238844019  whenI= 69843
innerSet  [executing readOnly]  lastSI= 69840 when=  2021-01-05 17:57:31.012695985 +0000 UTC m=+70.249808125  whenI= 69854
innerSet  [executing global-default]  lastSI= 69843 when=  2021-01-05 17:57:31.013749031 +0000 UTC m=+70.250860962  whenI= 69856
innerSet  [waiting workload-high]  lastSI= 61158 when=  2021-01-05 17:57:31.64359886 +0000 UTC m=+70.880710833  whenI= 7135244652114
innerSet  [waiting readOnly]  lastSI= 69820 when=  2021-01-05 17:57:32.440256654 +0000 UTC m=+71.677368627  whenI= 71282
innerSet  [waiting mutating]  lastSI= 60123 when=  2021-01-05 17:57:32.459863283 +0000 UTC m=+71.696975465  whenI= 71302
innerSet  [executing readOnly]  lastSI= 69854 when=  2021-01-05 17:57:32.723558851 +0000 UTC m=+71.960670782  whenI= 71565

Check out the high whenI on the 4th line. That causes a long-long loop to start. And there are more of such lines.

If I refactor the code to remove the inline func() and do the Unlock() and the end of the block, this behavior goes away, apiserver goes back to normal. May it be a compiler bug?