alertmanager: AlertManager not sending all alerts to Webhook endpoint.
Hi,
I’m using a webhook receiver for AlertManager to store alerts for pagination etc. For the most part, the webhook seems to be working just fine, but for some alerts, the webhook doesn’t seem to receive a POST call at all from AlertManager.
Is there any way to troubleshoot this? For example, a way to trace alertmanager’s outgoing HTTP calls to the webhook receiver?
The webhook endpoint is a Rails application server which also logs all incoming traffic, and after investigating, the missing alerts never show up in the logs (a POST request is never received).
- What I expect: All alerts go through to the webhook endpoint
- What I see: Only some alerts make it through to the webhook endpoint (Rails application that logs incoming requests)
I’ve attached a partial configuration, omitting redundant receivers etc. They’re almost all the same.
Thanks,
Environment
-
System information:
Linux 4.14.186-146.268.amzn2.x86_64 x86_64
-
Alertmanager version:
alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03ebfe21009c546e4d1e9b92c371d67c021d) build user: root@dee35927357f build date: 20200617-08:54:02 go version: go1.14.4
* Prometheus version:
```
prometheus, version 2.22.0 (branch: HEAD, revision: 0a7fdd3b76960808c3a91d92267c3d815c1bc354)
build user: root@6321101b2c50
build date: 20201015-12:29:59
go version: go1.15.3
platform: linux/amd64
- Alertmanager configuration file:
global:
resolve_timeout: 5m
http_config: {}
smtp_from: no-reply@testsite.com
smtp_hello: localhost
smtp_smarthost: smtp.office365.com:587
smtp_auth_username: no-reply@testsite.com
smtp_auth_password: <secret>
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/v2/enqueue
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
receiver: device-alerts.hook
group_by:
- alertname
- uid
- group_id
- stack_name
- tenant_id
- tenant_name
- rule_stack
- rule_tenant
routes:
- receiver: Test Presence Offline Notification Name
match_re:
alertname: ^(Test Presence Offline Alert Name)$
group_id: 460599d4-3c4a-4311-a7d6-bdce6058672a
tenant_name: ^(vle)$
continue: true
repeat_interval: 10y
group_wait: 30s
group_interval: 5m
repeat_interval: 30m
receivers:
- name: device-alerts.hook
webhook_configs:
- send_resolved: true
http_config: {}
url: http://127.0.0.1/v1/webhook
max_alerts: 0
- name: Test Presence Offline Notification Name
email_configs:
- send_resolved: false
to: testuser@testsite.com
from: no-reply@testsite.com
hello: localhost
smarthost: smtp.office365.com:587
auth_username: no-reply@testsite.com
auth_password: <secret>
headers:
From: no-reply@testsite.com
Smtp_from: no-reply@testsite.com
Subject: 'Alert: {{ range .Alerts }}{{ .Labels.device_name }}{{ end }} | {{ range .Alerts }}{{ .Annotations.description }}{{ end }} | {{ range .Alerts }}{{ .Labels.uid }}{{ end }}'
To: Test.dooling@testsite.onmicrosoft.com
X-SES-CONFIGURATION-SET: ses-kibana
html: '{{ template "email.default.html" . }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}Rule: {{ range .Alerts }}{{ .Labels.alertname }}{{ end }}Group: {{ range .Alerts }}{{ .Labels.group_name }}{{ end }}Device Name: {{ range .Alerts }}{{ .Labels.device_name }}{{ end }}Serial Number: {{ range .Alerts }}{{ .Labels.uid }}{{ end }}'
require_tls: true
templates:
- /etc/alertmanager/templates/default.tmpl
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 38 (7 by maintainers)
In case this helps anyone, I was running AlertManager through prometheus-operator, and I experienced the exact same problem.
In my case the cause was that alert-manager was matching only alert that contained the right namespace label. There is an issue about that in https://github.com/prometheus-operator/prometheus-operator/issues/3737