kubernetes: Memory leak in controller manager
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
Apologies for this not being more specific from “in the controller manager” – I’ve attached a heap dump which hopefully will clarify but I haven’t figured out yet how to track this down further to a specific controller. Right now we’re restarting the controller manager every hour to mitigate this issue.
What happened:
The controller manager leaked 32GB of memory before being OOM killed. Here’s a screenshot from our monitoring tool:
![screen shot 2017-09-18 at 9 00 18 am](https://user-images.githubusercontent.com/23065472/30551658-ff5e29b2-9c4f-11e7-8551-40e304bd7d69.png)
We have quite a lot of pod churn in our cluster because we run cron jobs, which I believe is related to this. There are not very many active pods in our cluster at any given time – at most ~300.
Here’s a heap profile from pprof:
- PDF image of the heap profile: profile004.pdf
- raw data: pprof.com_github_kubernetes_hyperkube.static.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
I’m not super experienced with reading pprof files but this looks to me like pods are being leaked somewhere. There are about 1.2GB of pods, which is way more than the 300 active pods in the cluster should account for.
Here’s a summary of the pprof file above:
(pprof) top
Showing nodes accounting for 1914.61MB, 96.89% of 1976.01MB total
Dropped 328 nodes (cum <= 9.88MB)
Showing top 10 nodes out of 49
flat flat% sum% cum cum%
778.09MB 39.38% 39.38% 778.09MB 39.38% runtime.rawstringtmp
597.51MB 30.24% 69.62% 1211.27MB 61.30% k8s.io/kubernetes/pkg/api/v1.(*PodSpec).Unmarshal
183.11MB 9.27% 78.88% 494.25MB 25.01% k8s.io/kubernetes/pkg/api/v1.(*Container).Unmarshal
160.57MB 8.13% 87.01% 170.78MB 8.64% runtime.mapassign
88.07MB 4.46% 91.46% 88.07MB 4.46% reflect.unsafe_New
34MB 1.72% 93.19% 56MB 2.83% k8s.io/kubernetes/pkg/api/v1.(*VolumeSource).Unmarshal
27.01MB 1.37% 94.55% 66.01MB 3.34% k8s.io/kubernetes/pkg/api/v1.(*PodStatus).Unmarshal
19.05MB 0.96% 95.52% 19.05MB 0.96% runtime.makemap
17MB 0.86% 96.38% 541.01MB 27.38% k8s.io/apimachinery/pkg/apis/meta/v1.(*ObjectMeta).Unmarshal
10.21MB 0.52% 96.89% 10.21MB 0.52% runtime.hashGrow
Environment:
- Kubernetes version: 1.7.3
- Cloud provider: AWS
- OS: Ubuntu 16.04
- Kernel: 4.4
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 6
- Comments: 36 (25 by maintainers)
Commits related to this issue
- Enable PodPriority feature and set priorityClass for cluster components Per https://v1-9.docs.kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ - Open PodPriority feature gate - Enab... — committed to naveens/ansible-coreos-kubernetes-master by naveens 6 years ago
- Enable PodPriority feature and set priorityClass for cluster components Per https://v1-9.docs.kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ - Open PodPriority feature gate - Enab... — committed to vmware-archive/ansible-coreos-kubernetes-master by naveens 6 years ago
If you are rapidly creating and deleting pods containing tolerations, a memory leak was just found in the node lifecycle taint manager. See https://github.com/kubernetes/kubernetes/pull/65339 for details
running a daemonset with an unhealthy node will trigger this issue, where the kublelet rejects/deletes the pod because it doesn’t have capacity, then the daemonset immediately creates a replacement pod for that node.
v1.9.8
->v1.9.9
seems to have fixedkube-controller-manager
RSS (latest ~flat green line since upgrade):Our use case creates in the order of 1k~ish per hour
Pods
from app-driven scheduled KubernetesJobs
, with tolerations indeed (to land these at specific nodes).@julia-stripe can you confirm if the pods that were being created/deleted contained
.spec.tolerations
?closing as resolved by https://github.com/kubernetes/kubernetes/pull/65339
/close
@justinsb How can we put restrictions using kops for controller manager. We are also facing the leak.
As a workaround for this we put a 16Gi memory limit in the
kube-controller-manager
config and that has worked well for over a month now. Hope this helps.@dims
We do have
successfulJobsHistoryLimit
andfailedJobsHistoryLimit
set.If the problem were due to the actual number of jobs in the cluster, I’d expect the memory usage to immediately go back up once the controller manager is restarted. The fact that restarting it seems to fix the problem makes me think it’s a leak. Does that reasoning make sense to you?