prometheus: External Labels not available in alert Annotations
What did you do?
Defined an external label in prometheus config:
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
monitor: acme-logs-dev
In a rule I had:
ALERT container_eating_memory
IF sum(container_memory_rss{container_label_com_docker_swarm_task_name=~".+"}) BY (instance, name) > 2500000000
FOR 5m
ANNOTATIONS {
description="{{ $labels.container_label_com_docker_swarm_task_name }} is eating up a LOT of memory. Memory consumption of {{ $labels.container_label_com_docker_swarm_task_name }} is at {{ humanize $value}}.",
summary="{{$labels.monitor}} - HIGH MEMORY USAGE WARNING: TASK '{{ $labels.container_label_com_docker_swarm_task_name }}' on '{{ $labels.instance }}'"}
What did you expect to see? The templated alert printing the monitor label.
But the alert in both prometheus alert view and the alert that ended up in slack were both missing the external label
What did you see instead? Under which circumstances?
No label listed in prometheus alerts page and empty string in slack
Environment
-
System information:
-
Prometheus version:
1.7.1
-
Alertmanager version:
0.8.0
-
Prometheus configuration file:
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
monitor: acme-logs-dev
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager.service.acme:9093
scheme: http
timeout: 10s
rule_files:
- /etc/prometheus/tasks.rules
- /etc/prometheus/host.rules
- /etc/prometheus/containers.rules
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
dns_sd_configs:
- names:
- tasks.cadvisor
refresh_interval: 30s
type: A
port: 8080
- job_name: node-exporter
scrape_interval: 5s
scrape_timeout: 5s
metrics_path: /metrics
scheme: http
dns_sd_configs:
- names:
- tasks.nodeexporter
refresh_interval: 30s
type: A
port: 9100
- job_name: prometheus
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- localhost:9090
- job_name: blackbox-http
params:
module:
- http_2xx
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /probe
scheme: http
dns_sd_configs:
- names:
- tasks.kibana.service.acme
refresh_interval: 30s
type: A
port: 5601
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: ${1}
action: replace
- source_labels: []
separator: ;
regex: .*
target_label: __address__
replacement: blackboxexporter:9115
action: replace
- source_labels: [__meta_dns_name]
separator: ;
regex: tasks\.(.*)
target_label: job
replacement: ${1}
action: replace
- job_name: blackbox-tcp
params:
module:
- tcp_connect
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /probe
scheme: http
dns_sd_configs:
- names:
- tasks.elasticsearch.service.acme
refresh_interval: 30s
type: A
port: 9200
- names:
- tasks.logstash.service.acme
refresh_interval: 30s
type: A
port: 5000
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: ${1}
action: replace
- source_labels: []
separator: ;
regex: .*
target_label: __address__
replacement: blackboxexporter:9115
action: replace
- source_labels: [__meta_dns_name]
separator: ;
regex: tasks\.(.*)
target_label: job
replacement: ${1}
action: replace
- Alertmanager configuration file:
route:
receiver: 'slack'
repeat_interval: 3h #3h
group_interval: 5m #5m
group_wait: 1m #1m
routes:
#- receiver: 'logstash'
# continue: true
- receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
api_url: 'xxx'
username: 'Prometheus - Alerts'
channel: '#service-alerts'
title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
icon_emoji: ':dart:'
- name: 'logstash'
webhook_configs:
# Whether or not to notify about resolved alerts.
- send_resolved: true
# The endpoint to send HTTP POST requests to.
url: 'http://logstash:8080/'
- Logs:
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 52 (44 by maintainers)
Having just implemented another work around for this for the _n_th time at SoundCloud triggered me to do a quick straw poll among the Prometheus people here. Everybody wants this feature.
I don’t know if we should continue the discussion. My suspicion is that all arguments had been tabled already. If there is still no consensus of doing this, we could do a formal vote in prometheus-team@.
I think the point, at least for us, is that humans do care about the Prometheus that alerts are generated from even if you don’t or you don’t think they should. Even if we ignore this specific scenario the fact remains that there can be external labels or alert label rewrites that can add contextual information for humans that are receiving the alerts which can be helpful. If you are taking a quick glance at multiple alerts, the summary or description annotations seem to be the two best places to put this type of useful information.
Some, but not all, labels being available is confusing at best. If the human decides that combining this information in a specific way is useful to them and their organization then isn’t that something worth considering? If you can get this done in a single place (the alert definition) rather than multiple places, why wouldn’t you? It’s generally what the user would expect.
Finally fixed by #5463 .
As said, my suggestion includes that the label set that includes external labels is accessed under a different name. That should cover the case where a template depends on the external label not being there.
The case where an alerting expression creates a label that is then removed by the labels section in the alert so that it can then be added again via the external labels would indeed be changed by my suggestion. In case that’s of any practical relevance, we can still go down the road of a feature flag.
We could define an order if that helps getting a useful feature in.
BTW: In which form would people make labels depend on annotations? Was that a request you deemed reasonable or are you just bringing it up to needlessly complicate the discussion?
Some have one, some hove none, some use PromDash, some might use something completely different.
I don’t have to add it. It’s in the
external_labels
of the Prometheus server. I just want to access it to generate descriptions that read nicely.For both topics above, I declare my discussion quota exhausted. I guess there are not many I need to convince here, perhaps Brian is the only one. My ambition to do so is limited. And so are my resources.
A different story are actual technical issues, which I will continue to discuss, see next comment.
Which still doesn’t change that it has very little weight in this discussion.
I think there are some valid use cases where doing this with alert notification templates isn’t desirable. For instance, consider dashboard URLs that might contain the datacenter (an external label). Some alerts might include dashboard URLs with datacenter, some might not. And it would be super usefule for the URLs that appear in the Prometheus to contain correct datacenters, and not contain place holders.
Is there are way of currently doing this or should this issue be reopened?