kubernetes: Memory leak in controller manager

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

Apologies for this not being more specific from “in the controller manager” – I’ve attached a heap dump which hopefully will clarify but I haven’t figured out yet how to track this down further to a specific controller. Right now we’re restarting the controller manager every hour to mitigate this issue.

What happened:

The controller manager leaked 32GB of memory before being OOM killed. Here’s a screenshot from our monitoring tool:

We have quite a lot of pod churn in our cluster because we run cron jobs, which I believe is related to this. There are not very many active pods in our cluster at any given time – at most ~300.

Here’s a heap profile from pprof:

PDF image of the heap profile: profile004.pdf
raw data: pprof.com_github_kubernetes_hyperkube.static.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz

I’m not super experienced with reading pprof files but this looks to me like pods are being leaked somewhere. There are about 1.2GB of pods, which is way more than the 300 active pods in the cluster should account for.

Here’s a summary of the pprof file above:

(pprof) top
Showing nodes accounting for 1914.61MB, 96.89% of 1976.01MB total
Dropped 328 nodes (cum <= 9.88MB)
Showing top 10 nodes out of 49
      flat  flat%   sum%        cum   cum%
  778.09MB 39.38% 39.38%   778.09MB 39.38%  runtime.rawstringtmp
  597.51MB 30.24% 69.62%  1211.27MB 61.30%  k8s.io/kubernetes/pkg/api/v1.(*PodSpec).Unmarshal
  183.11MB  9.27% 78.88%   494.25MB 25.01%  k8s.io/kubernetes/pkg/api/v1.(*Container).Unmarshal
  160.57MB  8.13% 87.01%   170.78MB  8.64%  runtime.mapassign
   88.07MB  4.46% 91.46%    88.07MB  4.46%  reflect.unsafe_New
      34MB  1.72% 93.19%       56MB  2.83%  k8s.io/kubernetes/pkg/api/v1.(*VolumeSource).Unmarshal
   27.01MB  1.37% 94.55%    66.01MB  3.34%  k8s.io/kubernetes/pkg/api/v1.(*PodStatus).Unmarshal
   19.05MB  0.96% 95.52%    19.05MB  0.96%  runtime.makemap
      17MB  0.86% 96.38%   541.01MB 27.38%  k8s.io/apimachinery/pkg/apis/meta/v1.(*ObjectMeta).Unmarshal
   10.21MB  0.52% 96.89%    10.21MB  0.52%  runtime.hashGrow

Environment:

Kubernetes version: 1.7.3
Cloud provider: AWS
OS: Ubuntu 16.04
Kernel: 4.4

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 6
Comments: 36 (25 by maintainers)

Commits related to this issue

Enable PodPriority feature and set priorityClass for cluster components Per https://v1-9.docs.kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ - Open PodPriority feature gate - Enab... — committed to naveens/ansible-coreos-kubernetes-master by naveens 6 years ago
Enable PodPriority feature and set priorityClass for cluster components Per https://v1-9.docs.kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ - Open PodPriority feature gate - Enab... — committed to vmware-archive/ansible-coreos-kubernetes-master by naveens 6 years ago

Most upvoted comments

If you are rapidly creating and deleting pods containing tolerations, a memory leak was just found in the node lifecycle taint manager. See https://github.com/kubernetes/kubernetes/pull/65339 for details

running a daemonset with an unhealthy node will trigger this issue, where the kublelet rejects/deletes the pod because it doesn’t have capacity, then the daemonset immediately creates a replacement pod for that node.

liggitt on Jun 22, 2018

v1.9.8 -> v1.9.9 seems to have fixed kube-controller-manager RSS (latest ~flat green line since upgrade): k dev_kube-controller-manager_rss 20180718

Our use case creates in the order of 1k~ish per hour Pods from app-driven scheduled Kubernetes Jobs, with tolerations indeed (to land these at specific nodes).

jjo on Jul 18, 2018

@julia-stripe can you confirm if the pods that were being created/deleted contained .spec.tolerations?

liggitt on Jun 22, 2018

closing as resolved by https://github.com/kubernetes/kubernetes/pull/65339

/close

liggitt on Jul 19, 2018

@justinsb How can we put restrictions using kops for controller manager. We are also facing the leak.

$ kubectl top po -n kube-system | grep controller-man
kube-controller-manager-ip-A.ap-southeast-1.compute.internal   0m           15Mi
kube-controller-manager-ip-B-ap-southeast-1.compute.internal    0m           15Mi
kube-controller-manager-ip-C-ap-southeast-1.compute.internal   108m         3478Mi

alok87 on May 8, 2018

As a workaround for this we put a 16Gi memory limit in the kube-controller-manager config and that has worked well for over a month now. Hope this helps.

joewilliams on Mar 9, 2018

@dims

We do have successfulJobsHistoryLimit and failedJobsHistoryLimit set.

If the problem were due to the actual number of jobs in the cluster, I’d expect the memory usage to immediately go back up once the controller manager is restarted. The fact that restarting it seems to fix the problem makes me think it’s a leak. Does that reasoning make sense to you?

julia-stripe on Sep 18, 2017