rancher: when no notifiers are configured, prometheus can't find alert manager and constantly logs

Rancher/rancher:v2.2.2 No notifiers are configured in the cluster, and I assume that no alert manager instances are deployed. If that is the case, then we should not set the alert manager config in prometheus. It tries to find it ever 6 minutes or so. Would be nice to have a clean log.

evel=info ts=2019-04-17T22:28:47.410721056Z caller=main.go:695 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-04-17T22:28:47.466680554Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:28:47.467915185Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:28:47.46872607Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:28:47.469568383Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:28:47.47274569Z caller=main.go:722 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=error ts=2019-04-17T22:28:47.595097871Z caller=notifier.go:481 component=notifier alertmanager=http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts count=0 msg="Error sending alert" err="Post http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts: dial tcp: lookup alertmanager-operated.cattle-prometheus on 10.31.0.10:53: no such host"
level=info ts=2019-04-17T22:34:47.597693835Z caller=main.go:695 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-04-17T22:34:47.89329663Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:34:47.894429793Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:34:47.895314721Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:34:47.89624308Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:34:47.899885093Z caller=main.go:722 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=error ts=2019-04-17T22:34:52.615082709Z caller=notifier.go:481 component=notifier alertmanager=http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts count=0 msg="Error sending alert" err="Post http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts: dial tcp: lookup alertmanager-operated.cattle-prometheus on 10.31.0.10:53: no such host"
level=info ts=2019-04-17T22:40:47.410780055Z caller=main.go:695 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-04-17T22:40:47.486244501Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:40:47.487459958Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:40:47.488555538Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:40:47.489660993Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-04-17T22:40:47.4932466Z caller=main.go:722 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=error ts=2019-04-17T22:40:52.615534267Z caller=notifier.go:481 component=notifier alertmanager=http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts count=0 msg="Error sending alert" err="Post http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts: dial tcp: lookup alertmanager-operated.cattle-prometheus on 10.31.0.10:53: no such host"

About this issue

Most upvoted comments

We placed the values in Rancher -> Cluster -> Tools -> Monitoring -> grafik

The values itself are actually depending on your workload amounts.

rancher v2.4.5 This problem still exists

2020/08/25 09:26:12 [ERROR] Failed to send alert: Post http://alertmanager-operated.cattle-prometheus.svc.cluster.local:9093/api/v1/alerts: dial tcp: lookup alertmanager-operated.cattle-prometheus.svc.cluster.local on 10.204.3.168:53: no su E0825 09:26:20.495601 33 watcher.go:214] watch chan error: etcdserver: mvcc: required revision has been compacted

Same here. Cluster-Monitoring runs fine for a couple of days, then “Monitoring API is not ready” and the prometheus container logs : grafik

level=warn ts=2019-09-25T12:39:49.038781441Z caller=main.go:295 deprecation_notice="\"storage.tsdb.retention\" flag is deprecated use \"storage.tsdb.retention.time\" instead."
level=info ts=2019-09-25T12:39:49.03889983Z caller=main.go:302 msg="Starting Prometheus" version="(version=2.7.1, branch=HEAD, revision=62e591f928ddf6b3468308b7ac1de1c63aa7fcf3)"
level=info ts=2019-09-25T12:39:49.038925503Z caller=main.go:303 build_context="(go=go1.11.5, user=root@f9f82868fc43, date=20190131-11:16:59)"
level=info ts=2019-09-25T12:39:49.03894996Z caller=main.go:304 host_details="(Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 x86_64 prometheus-cluster-monitoring-0 (none))"
level=info ts=2019-09-25T12:39:49.038974301Z caller=main.go:305 fd_limits="(soft=65536, hard=65536)"
level=info ts=2019-09-25T12:39:49.038996007Z caller=main.go:306 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-09-25T12:39:49.039956656Z caller=main.go:620 msg="Starting TSDB ..."
level=info ts=2019-09-25T12:39:49.040070764Z caller=web.go:416 component=web msg="Start listening for connections" address=127.0.0.1:9090
level=info ts=2019-09-25T12:39:49.043700213Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1569355200000 maxt=1569362400000 ulid=01DNJR1HE89GCQXK0XPS7QGN9Q
level=info ts=2019-09-25T12:39:49.045527787Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1569362400000 maxt=1569369600000 ulid=01DNJYYW3J6VGNB2NHQVTT919B

level=info ts=2019-09-25T12:39:49.047415328Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1569369600000 maxt=1569376800000 ulid=01DNK6034ETSZHJ9MK0036HN9N
level=info ts=2019-09-25T12:39:49.049159101Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1569376800000 maxt=1569384000000 ulid=01DNKCMQ9P9Y4Z9ZNF0X2BHM1K
level=info ts=2019-09-25T12:39:49.05096651Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1569384000000 maxt=1569391200000 ulid=01DNKKHHD4563MQRWAMTTQG77N
level=info ts=2019-09-25T12:39:49.052796539Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1569391200000 maxt=1569398400000 ulid=01DNKTC5VWTTRYYXJ5VVDNEWBZ
level=info ts=2019-09-25T12:39:49.054617459Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1569398400000 maxt=1569405600000 ulid=01DNM18PDZ0WYB4B1Y2M017DNC
level=warn ts=2019-09-25T12:39:49.059296468Z caller=wal.go:116 component=tsdb msg="last page of the wal is torn, filling it with zeros" segment=/prometheus/wal/00000022
level=warn ts=2019-09-25T12:40:05.623939925Z caller=head.go:440 component=tsdb msg="unknown series references" count=257
level=info ts=2019-09-25T12:40:06.331713625Z caller=main.go:635 msg="TSDB started"
level=info ts=2019-09-25T12:40:06.331781676Z caller=main.go:695 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-09-25T12:40:06.338060633Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-09-25T12:40:06.339485399Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-09-25T12:40:06.340583012Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-09-25T12:40:06.341829041Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-09-25T12:40:06.342957358Z caller=main.go:722 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-09-25T12:40:06.342979664Z caller=main.go:589 msg="Server is ready to receive web requests."
level=error ts=2019-09-25T12:40:09.055629106Z caller=notifier.go:481 component=notifier alertmanager=http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts count=0 msg="Error sending alert" err="Post http://alertmanager-operated.cattle-prometheus:9093/api/v1/alerts: dial tcp: lookup alertmanager-operated.cattle-prometheus on 10.43.0.10:53: no such host"
runtime: failed to create new OS thread (have 14 already; errno=12)
fatal error: newosproc

In addition the prometheus-agent logs the following and restarts over grafik

INFO[2019-09-26T06:48:46Z] listening on 10.42.9.213:9090, proxying to http://127.0.0.1:9090 with ignoring 'remote reader' labels [prometheus,prometheus_replica], only allow maximum 512 connections with 5m0s read timeout .
INFO[2019-09-26T06:48:46Z] Start listening for connections on 10.42.9.213:9090
2019/09/26 06:48:59 http: proxy error: context canceled
2019/09/26 06:49:05 http: proxy error: context canceled
2019/09/26 06:49:09 http: proxy error: context canceled
2019/09/26 06:49:15 http: proxy error: context canceled
2019/09/26 06:49:19 http: proxy error: context canceled
2019/09/26 06:49:25 http: proxy error: context canceled
2019/09/26 06:49:29 http: proxy error: context canceled
2019/09/26 06:49:35 http: proxy error: context canceled
2019/09/26 06:49:39 http: proxy error: context canceled
2019/09/26 06:49:45 http: proxy error: context canceled
2019/09/26 06:49:49 http: proxy error: context canceled
2019/09/26 06:49:55 http: proxy error: context canceled