kubernetes: memory leak in kubelet 1.12.5

What happened: After upgrading to kubernetes 1.12.5 we observe failing nodes, that are caused by kubelet eating all over the memory after some time.

image

I use image k8s.gcr.io/hyperkube:v1.12.5 to run kubelet on 102 clusters and since a week we see some nodes leaking memory, caused by kubelet.

I investigated some of these kubelets with strace and pprof. With 3s of running strace I saw >= 50 openat() calls to the same file from the same threadid (pid) from kubelet:

    50 [pid 12416] openat(10, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 8
    54 [pid 12368] openat(6, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 4
    72 [pid 12368] openat(8, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 4
    80 [pid 12368] openat(4, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
    85 [pid 12416] openat(8, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 6

If I do pprof kubelet it shows client_go metrics and compress is taking over most of the compute time.

(pprof) root@ip-172-31-10-50:~# go tool pprof  http://localhost:10248/debug/pprof/heap
Fetching profile from http://localhost:10248/debug/pprof/heap
Saved profile in /root/pprof/pprof.hyperkube.localhost:10248.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top
2406.93MB of 2451.55MB total (98.18%)
Dropped 2863 nodes (cum <= 12.26MB)
Showing top 10 nodes out of 34 (cum >= 2411.39MB)
      flat  flat%   sum%        cum   cum%
 2082.07MB 84.93% 84.93%  2082.07MB 84.93%  k8s.io/kubernetes/vendor/github.com/beorn7/perks/quantile.newStream (inline)
  311.65MB 12.71% 97.64%  2398.72MB 97.84%  k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.newSummary
   10.71MB  0.44% 98.08%  2414.43MB 98.49%  k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.(*MetricVec).getOrCreateMetricWithLabelValues
    2.50MB   0.1% 98.18%  2084.57MB 85.03%  k8s.io/kubernetes/vendor/github.com/beorn7/perks/quantile.NewTargeted
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/cmd/kubelet/app.startKubelet.func1
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).HandlePodAdditions
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).Run
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoop
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoopIteration
         0     0% 98.18%  2411.39MB 98.36%  k8s.io/kubernetes/pkg/kubelet/pod.(*basicManager).AddPod

Memory profile png: mem_profile001

The reflector metrics seem to be the problem:

curl -s localhost:10255/metrics | sed 's/{.*//' | sort | uniq -c | sort -nr
  25749 reflector_watch_duration_seconds
  25749 reflector_list_duration_seconds
  25749 reflector_items_per_watch
  25749 reflector_items_per_list
   8583 reflector_watches_total
   8583 reflector_watch_duration_seconds_sum
   8583 reflector_watch_duration_seconds_count
   8583 reflector_short_watches_total
   8583 reflector_lists_total
   8583 reflector_list_duration_seconds_sum
   8583 reflector_list_duration_seconds_count
   8583 reflector_last_resource_version
   8583 reflector_items_per_watch_sum
   8583 reflector_items_per_watch_count
   8583 reflector_items_per_list_sum
   8583 reflector_items_per_list_count
    165 storage_operation_duration_seconds_bucket
     51 kubelet_runtime_operations_latency_microseconds
     44 rest_client_request_latency_seconds_bucket
     33 kubelet_docker_operations_latency_microseconds
     17 kubelet_runtime_operations_latency_microseconds_sum
     17 kubelet_runtime_operations_latency_microseconds_count
     17 kubelet_runtime_operations

What you expected to happen:

I expect that kubelet does not need so much memory

How to reproduce it (as minimally and precisely as possible):

I don’t know

Anything else we need to know?:

One of the affected clusters has only 120 Pods and 3 Pods are in CrashLoopBackOff state, one pod since 6 days on an affected node which was investigated.

Environment:

  • Kubernetes version (use kubectl version): v1.12.5
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
# cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1800.7.0
VERSION_ID=1800.7.0
BUILD_ID=2018-08-15-2254
PRETTY_NAME="Container Linux by CoreOS 1800.7.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 35 (26 by maintainers)

Commits related to this issue

Most upvoted comments

For everyone that finds this issue and needs a patch to disable the reflector metrics:

diff --git c/pkg/util/reflector/prometheus/prometheus.go i/pkg/util/reflector/prometheus/prometheus.go
index 958a0007cd..63657e9c55 100644
--- c/pkg/util/reflector/prometheus/prometheus.go
+++ i/pkg/util/reflector/prometheus/prometheus.go
@@ -85,8 +85,6 @@ func init() {
 	prometheus.MustRegister(watchDuration)
 	prometheus.MustRegister(itemsPerWatch)
 	prometheus.MustRegister(lastResourceVersion)
-
-	cache.SetReflectorMetricsProvider(prometheusMetricsProvider{})
 }
 
 type prometheusMetricsProvider struct{}

Han points out that, since the metric has a random suffix each time, it’s not very useful for monitoring anyway.

explicitly enable the metrics by calling a function instead of having this happen as a side effect of an import statement.

LGTM. @wojtek-t @derekwaynecarr @yujuhong WDYT? We should probably do this for both kubelet and control plane services (apiserver and controller-manager).

No, I suggest changing the code so that the end users (kubelet, apiserver, etc) would have to explicitly enable the metrics by calling a function instead of having this happen as a side effect of an import statement.