kubernetes: kube-controller-manager memory leak

What happened?

Our cluster has 5600 nodes and kube-controller-manager memory usage 197.5GiB, After restarting kube-controller-manager, the memory usage is only 15GiB, It looks like the controller manager memory is leaking. image

The same problem issue: #102718 and #102565

What did you expect to happen?

The kube-controller-manager memory usage should be kept at a smooth water level.

How can we reproduce it (as minimally and precisely as possible)?

NONE

Anything else we need to know?

NONE

Kubernetes version

Client Version: version.Info{
Major:"1", 
Minor:"17+", 
GitVersion:"v1.17.4", 
GitCommit:"f769ba94a8435eb3b446c5d39d7504823224a6f4", 
GitTreeState:"clean", 
BuildDate:"2020-06-22T02:50:15Z", 
GoVersion:"go1.14.2", 
Compiler:"gc", 
Platform:"linux/amd64"}

Cloud provider

NONE

OS version

# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Install tools

NONE

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 25 (24 by maintainers)

Most upvoted comments

@sxllwx Thank you very much, the memory usage of kube-controller-manager is normal now, but it is not clear what triggers the kube-controller-manager memory increase. The k8s cluster version may not be upgraded in a short time. If the memory usage of kube-controller-manager is abnormal, we will try to use the second suggest.

@xigang

According to the information provided so far (heap-pprof inuse_space):

We can draw the following conclusions:

  1. The memory usage of 2.77GB comes from List: #102718 #113305 Reports the memory usage problem of reflector (client-go) when performing List operations
  2. The memory usage of 135.14GB comes from Watch, of which 129.88GB is used for Node.Unmarshal.

Compared with 2.77GB, I think the memory consumption of 135.14GB should be more concerned.

I noticed that the version of golang used by kube-controller-manager on your side is: GoVersion:"go1.14.2" go1.16 (included) and later versions changed an option to free memory from the previously used MADV_FREE Modified to MADV_DONTNEED. The difference between these two parameters, it is recommended that you refer to: https://github.com/golang/go/issues/42894

Suggest:

  • (Highly recommended) Upgrade the version of kubernetes to introduce this change, so that you can also get better community support.
  • If it is really impossible to upgrade in a short time, you can try to set the environment variable GODEBUG=madvdontneed=1 to modify the default memory management behavior of go1.14.2. Then go ahead and see if that fixes your problem.