alertmanager: AlertManager not sending all alerts to Webhook endpoint.

Hi,

I’m using a webhook receiver for AlertManager to store alerts for pagination etc. For the most part, the webhook seems to be working just fine, but for some alerts, the webhook doesn’t seem to receive a POST call at all from AlertManager.

Is there any way to troubleshoot this? For example, a way to trace alertmanager’s outgoing HTTP calls to the webhook receiver?

The webhook endpoint is a Rails application server which also logs all incoming traffic, and after investigating, the missing alerts never show up in the logs (a POST request is never received).

  • What I expect: All alerts go through to the webhook endpoint
  • What I see: Only some alerts make it through to the webhook endpoint (Rails application that logs incoming requests)

I’ve attached a partial configuration, omitting redundant receivers etc. They’re almost all the same.

Thanks,

Environment

  • System information:

    Linux 4.14.186-146.268.amzn2.x86_64 x86_64

  • Alertmanager version:

alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03ebfe21009c546e4d1e9b92c371d67c021d) build user: root@dee35927357f build date: 20200617-08:54:02 go version: go1.14.4

* Prometheus version:

	```
prometheus, version 2.22.0 (branch: HEAD, revision: 0a7fdd3b76960808c3a91d92267c3d815c1bc354)
  build user:       root@6321101b2c50
  build date:       20201015-12:29:59
  go version:       go1.15.3
  platform:         linux/amd64
  • Alertmanager configuration file:

global:
  resolve_timeout: 5m
  http_config: {}
  smtp_from: no-reply@testsite.com
  smtp_hello: localhost
  smtp_smarthost: smtp.office365.com:587
  smtp_auth_username: no-reply@testsite.com
  smtp_auth_password: <secret>
  smtp_require_tls: true
  pagerduty_url: https://events.pagerduty.com/v2/enqueue
  opsgenie_api_url: https://api.opsgenie.com/
  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
  receiver: device-alerts.hook
  group_by:
  - alertname
  - uid
  - group_id
  - stack_name
  - tenant_id
  - tenant_name
  - rule_stack
  - rule_tenant
  routes:
  - receiver: Test Presence Offline Notification Name
    match_re:
      alertname: ^(Test Presence Offline Alert Name)$
      group_id: 460599d4-3c4a-4311-a7d6-bdce6058672a
      tenant_name: ^(vle)$
    continue: true
    repeat_interval: 10y

  group_wait: 30s
  group_interval: 5m
  repeat_interval: 30m
receivers:
- name: device-alerts.hook
  webhook_configs:
  - send_resolved: true
    http_config: {}
    url: http://127.0.0.1/v1/webhook
    max_alerts: 0
- name: Test Presence Offline Notification Name
  email_configs:
  - send_resolved: false
    to: testuser@testsite.com
    from: no-reply@testsite.com
    hello: localhost
    smarthost: smtp.office365.com:587
    auth_username: no-reply@testsite.com
    auth_password: <secret>
    headers:
      From: no-reply@testsite.com
      Smtp_from: no-reply@testsite.com
      Subject: 'Alert: {{ range .Alerts }}{{ .Labels.device_name }}{{ end }} | {{ range .Alerts }}{{ .Annotations.description }}{{ end }} | {{ range .Alerts }}{{ .Labels.uid }}{{ end }}'
      To: Test.dooling@testsite.onmicrosoft.com
      X-SES-CONFIGURATION-SET: ses-kibana
    html: '{{ template "email.default.html" . }}'
    text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}Rule: {{ range .Alerts }}{{ .Labels.alertname }}{{ end }}Group: {{ range .Alerts }}{{ .Labels.group_name }}{{ end }}Device Name: {{ range .Alerts }}{{ .Labels.device_name }}{{ end }}Serial Number: {{ range .Alerts }}{{ .Labels.uid }}{{ end }}'
    require_tls: true
templates:
- /etc/alertmanager/templates/default.tmpl

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 1
  • Comments: 38 (7 by maintainers)

Most upvoted comments

In case this helps anyone, I was running AlertManager through prometheus-operator, and I experienced the exact same problem.

In my case the cause was that alert-manager was matching only alert that contained the right namespace label. There is an issue about that in https://github.com/prometheus-operator/prometheus-operator/issues/3737