alertmanager: Regression: Resolving unqualified DNS names fails

What did you do? Running AM in prometheus as stateful set with a headless service, giving each AM a name like alertmanager-0.alertmanager.default.svc.cluster.local. The pod gets, among others, default.svc.cluster.local configured as search domain in /etc/resolve.conf:

# cat /etc/resolv.conf 
nameserver 10.35.240.10
search default.svc.cluster.local svc.cluster.local cluster.local c.latency-at.internal google.internal

This allows for the alertmanager-0.alertmanager name to be resolved unqualified like this:

# nslookup alertmanager-0.alertmanager
Server:    10.35.240.10
Address 1: 10.35.240.10 kube-dns.kube-system.svc.cluster.local

Name:      alertmanager-0.alertmanager
Address 1: 10.32.4.32 alertmanager-0.alertmanager.default.svc.cluster.local

The alertmanager though can’t resolve this name unqualified (which was working at least in 0.11.0) and logs this error:

level=warn ts=2018-03-30T13:25:15.016042032Z caller=cluster.go:129 component=cluster msg="failed to join cluster" err="2 errors occurred:\n\n* Failed to resolve alertmanager-0.alertmanager:6783: lookup alertmanager-0.alertmanager on 10.35.240.10:53: no such host\n* Failed to join 10.32.6.23: dial tcp 10.32.6.23:6783: connect: connection refused"

Environment

  • Alertmanager version:
alertmanager, version 0.15.0-rc.1 (branch: HEAD, revision: acb111e812530bec1ac6d908bc14725793e07cf3)

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 15 (12 by maintainers)

Most upvoted comments

As a workaround, I’m using the FQDN. So not urgent but something that should get fixed because others will trip over this too.

WTH… I just read the memberlist code and it implements it’s own resolver and only uses the stdlib when this fails: https://github.com/hashicorp/memberlist/blob/9f5b38f1dc837733754bf57f4ea62726a509c0fc/memberlist.go#L247

I’ll gonna fill an upstream issue.

@xkfen Someone would have to fix the upstream issue: https://github.com/hashicorp/memberlist/issues/147 Nothing has happened since I filled that issue. I’m still using my describe workaround above.

Hi there, any news about this?