prometheus: Prometheus Release 2.2.0 Memory Leak ?
Proposal
Use case. Why is this important?
Nice to have’ is not a good use case 😃
Bug Report
What did you do? Not thing
What did you expect to see? using memory and releasing it
What did you see instead? Under which circumstances? using memory and not releasing it
Environment
- System information:
#uname -srm
Linux 3.10.0-327.ali2010.rc7.alios7.x86_64 x86_64
- Prometheus version:
/prometheus $ prometheus --version
prometheus, version 2.2.1 (branch: HEAD, revision: 94e4a4321761f83207ad11542ee971e7d5220e80)
build user: root@XXX-XXX-aXXX
build date: 20180508-12:56:09
go version: go1.9.4
- Alertmanager version:
/alertmanager $ alertmanager --version
alertmanager, version 0.14.0 (branch: HEAD, revision: 30af4d051b37ce817ea7e35b56c57a0e2ec9dbb0)
build user: root@37b6a49ebba9
build date: 20180213-08:16:42
go version: go1.9.2
- Prometheus configuration file:
# Source: prometheus/templates/server-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-0
namespace: monitoring
labels:
component: "server"
addonmanager.kubernetes.io/mode: Reconcile
data:
prometheus.yml: |
global:
scrape_interval: 30s
#scrape_timeout: 10s
external_labels:
zone: xxx
#remote_write:
#- url: http://remote-storage-adapter-service.monitoring.xxxx.local:9201/write
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- prometheus-alertmanager-service.monitoring.xxxx.local:9093
rule_files:
- "/etc/rules/*.yml"
- "/etc/rules/alerts"
scrape_configs:
- job_name: 'prometheus-server'
static_configs:
- targets:
- xxxx:9090
- xxxx:9090
- job_name: 'kubernetes-apiservers'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
static_configs:
- targets:
- xxxx:6443
- xxxx:6443
- xxxx:6443
metric_relabel_configs:
- source_labels: ['__name__']
regex: 'apiserver_request.*'
action: keep
- job_name: 'kubernetes-etcds'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
static_configs:
- targets:
- xxxx:2379
- xxxx:2379
- xxxx:2379
- job_name: 'kubernetes-controller-manager'
scheme: http
static_configs:
- targets:
- xxxx:10252
- xxxx:10252
- xxxx:10252
- job_name: 'kubernetes-kube-scheduler'
scheme: http
static_configs:
- targets:
- xxxx:10251
- xxxx:10251
- xxxx:10251
- job_name: 'kubernetes-state-metrics'
scheme: http
static_configs:
- targets:
- prometheus-kube-state-metrics.monitoring.xxxx.local:8085
metric_relabel_configs:
- action: labeldrop
regex: instance
- action: labeldrop
regex: container_id
- action: labeldrop
regex: image_id
- source_labels: ['__name__']
action: drop
regex: '(kube_persistentvolume_status_phase|kube_pod_status_phase)'
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- api_server: xxxx.local:6443
role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: xxxx.local:6443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}:4194/proxy/metrics
metric_relabel_configs:
- source_labels: ['__name__']
action: keep
regex: '(container_cpu_usage_seconds_total|container_memory.*|container_network_.*|container_fs.*|machine.*)'
- source_labels: ['__name__']
action: drop
regex: container_memory_failures_total
- source_labels: [pod_name]
replacement: "$1"
target_label: pod
- action: labeldrop
regex: id
- action: labeldrop
regex: name
- action: labeldrop
regex: beta_kubernetes_io_arch
- action: labeldrop
regex: beta_kubernetes_io_os
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- api_server: xxxx.local:6443
role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: xxxx.local:6443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}:10255/proxy/metrics
metric_relabel_configs:
- source_labels: ['__name__']
regex: kubelet_volume_stats_.*
action: keep
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- api_server: xxxx.local:6443
role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: ${1}:${2}
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
metric_relabel_configs:
- source_labels: ['__name__']
regex: node_hwmon_.*
action: drop
- source_labels: ['__name__']
regex: http_.*
action: drop
- source_labels: ['__name__']
regex: node_vmstat_.*
action: drop
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- api_server: xxxx.local:6443
role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- api_server: xxxx.local:6443
role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
rules: ""
- Alertmanager configuration file:
insert configuration here (if relevant to the issue)
- Logs:
insert Prometheus and Alertmanager logs relevant to the issue here
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 30 (14 by maintainers)
@dfredell thanks for the report, but can you open a new issue as this one is getting to long.
@piaoyu can you please confirm if 2.3 fixed the problem for you so we can close this one?
Hello, we are also affected on three different clusters, we are getting POD OOM killed every few hours. What could be beneficial for you to help with the investigation ?