prometheus: Prometheus Release 2.2.0 Memory Leak ?

Proposal

Use case. Why is this important?

Nice to have’ is not a good use case 😃

Bug Report

What did you do? Not thing

What did you expect to see? using memory and releasing it

What did you see instead? Under which circumstances? using memory and not releasing it

Environment

  • System information:
#uname -srm
Linux 3.10.0-327.ali2010.rc7.alios7.x86_64 x86_64
  • Prometheus version:
/prometheus $ prometheus --version
prometheus, version 2.2.1 (branch: HEAD, revision: 94e4a4321761f83207ad11542ee971e7d5220e80)
  build user:       root@XXX-XXX-aXXX
  build date:       20180508-12:56:09
  go version:       go1.9.4
  • Alertmanager version:
/alertmanager $ alertmanager --version
alertmanager, version 0.14.0 (branch: HEAD, revision: 30af4d051b37ce817ea7e35b56c57a0e2ec9dbb0)
  build user:       root@37b6a49ebba9
  build date:       20180213-08:16:42
  go version:       go1.9.2
  • Prometheus configuration file:
# Source: prometheus/templates/server-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-0
  namespace: monitoring
  labels:
    component: "server"
    addonmanager.kubernetes.io/mode: Reconcile
data:
  prometheus.yml: |
    global:
      scrape_interval: 30s
      #scrape_timeout: 10s
      external_labels:
        zone: xxx

    #remote_write:
     #- url: http://remote-storage-adapter-service.monitoring.xxxx.local:9201/write

    alerting:
      alertmanagers:
      - scheme: http
        static_configs:
        - targets:
          - prometheus-alertmanager-service.monitoring.xxxx.local:9093

    rule_files:
      - "/etc/rules/*.yml"
      - "/etc/rules/alerts"

    scrape_configs:

      - job_name: 'prometheus-server'
        static_configs:
        - targets:
          - xxxx:9090
          - xxxx:9090

      - job_name: 'kubernetes-apiservers'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        static_configs:
        - targets:
          - xxxx:6443
          - xxxx:6443
          - xxxx:6443
        metric_relabel_configs:
          - source_labels: ['__name__']
            regex: 'apiserver_request.*'
            action: keep

      - job_name: 'kubernetes-etcds'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        static_configs:
        - targets:
          - xxxx:2379
          - xxxx:2379
          - xxxx:2379

      - job_name: 'kubernetes-controller-manager'
        scheme: http
        static_configs:
        - targets:
          - xxxx:10252
          - xxxx:10252
          - xxxx:10252

      - job_name: 'kubernetes-kube-scheduler'
        scheme: http
        static_configs:
        - targets:
          - xxxx:10251
          - xxxx:10251
          - xxxx:10251

      - job_name: 'kubernetes-state-metrics'
        scheme: http
        static_configs:
        - targets:
          - prometheus-kube-state-metrics.monitoring.xxxx.local:8085
        metric_relabel_configs:
          - action: labeldrop
            regex: instance
          - action: labeldrop
            regex: container_id
          - action: labeldrop
            regex: image_id
          - source_labels: ['__name__']
            action: drop
            regex: '(kube_persistentvolume_status_phase|kube_pod_status_phase)'

      - job_name: 'kubernetes-nodes'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - api_server: xxxx.local:6443
          role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: xxxx.local:6443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}:4194/proxy/metrics
        metric_relabel_configs:
          - source_labels: ['__name__']
            action: keep
            regex: '(container_cpu_usage_seconds_total|container_memory.*|container_network_.*|container_fs.*|machine.*)'
          - source_labels: ['__name__']
            action: drop
            regex: container_memory_failures_total
          - source_labels: [pod_name]
            replacement: "$1"
            target_label: pod
          - action: labeldrop
            regex: id
          - action: labeldrop
            regex: name
          - action: labeldrop
            regex: beta_kubernetes_io_arch
          - action: labeldrop
            regex: beta_kubernetes_io_os

      - job_name: 'kubernetes-cadvisor'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - api_server: xxxx.local:6443
          role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: xxxx.local:6443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}:10255/proxy/metrics
        metric_relabel_configs:
          - source_labels: ['__name__']
            regex: kubelet_volume_stats_.*
            action: keep


      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
        - api_server: xxxx.local:6443
          role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: (.+)(?::\d+);(\d+)
            replacement: ${1}:${2}
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
        metric_relabel_configs:
          - source_labels: ['__name__']
            regex: node_hwmon_.*
            action: drop
          - source_labels: ['__name__']
            regex: http_.*
            action: drop
          - source_labels: ['__name__']
            regex: node_vmstat_.*
            action: drop

      - job_name: 'kubernetes-services'
        metrics_path: /probe
        params:
          module: [http_2xx]
        kubernetes_sd_configs:
        - api_server: xxxx.local:6443
          role: service
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
            action: keep
            regex: true
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: blackbox
          - source_labels: [__param_target]
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: kubernetes_name

      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
        - api_server: xxxx.local:6443
          role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: (.+):(?:\d+);(\d+)
            replacement: ${1}:${2}
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name
  rules: ""

  • Alertmanager configuration file:
insert configuration here (if relevant to the issue)
  • Logs:
insert Prometheus and Alertmanager logs relevant to the issue here

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 30 (14 by maintainers)

Most upvoted comments

@dfredell thanks for the report, but can you open a new issue as this one is getting to long.

@piaoyu can you please confirm if 2.3 fixed the problem for you so we can close this one?

Hello, we are also affected on three different clusters, we are getting POD OOM killed every few hours. What could be beneficial for you to help with the investigation ?