external-dns: Route 53 Failover setup issues - TXT record type

I’m hitting issues using the failover annotation with Route53.

These are the annotations on the ingress:

    external-dns.alpha.kubernetes.io/set-identifier: us-east-1
    external-dns.alpha.kubernetes.io/aws-failover: PRIMARY

I added a bit of debugging to dump the interface; here is the output:

ChangeBatch:
[{
  Action: "CREATE",
  ResourceRecordSet: {
    AliasTarget: {
      DNSName: "01234567-default-mysite-0123456789.us-east-1.elb.amazonaws.com",
      EvaluateTargetHealth: true,
      HostedZoneId: "ZZZZZDOTZZZZZZ"
    },
    Failover: "PRIMARY",
    Name: "hostname.mysite.com",
    SetIdentifier: "us-east-1",
    Type: "A"
  }
} {
  Action: "CREATE",
  ResourceRecordSet: {
    Failover: "PRIMARY",
    Name: "prefix.hostname.mysite.com",
    ResourceRecords: [{
        Value: "\"heritage=external-dns,external-dns/owner=mysite-com-prod-us-east-1,external-dns/resource=ingress/default/mysite-prod\""
      }],
    SetIdentifier: "us-east-1",
    TTL: 300,
    Type: "TXT"
  }
}]

The error I get is:

A non-alias primary ResourceRecordSet must have an associated health check. No changes made.

From looking into this, the issue is the TXT record can’t have a failover routing policy unless it’s got a health check, or unless it’s an ALIAS. A health check is not needed in this case since the A record is an ALIAS, and that checks the target health (EvaluateTargetHealth: true).

In order for external-dns to be able to store multiple TXT records for this failover A/ALIAS record, I think we need to have the TXT record be stored with a multi-value answer routing policy instead, as that gives more flexibility.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 6
  • Comments: 36 (5 by maintainers)

Commits related to this issue

Most upvoted comments

In case it benefits others, I think I discovered a workaround to the error:

A non-alias primary ResourceRecordSet must have an associated health check. No changes made.

It seems the AWS call just needs some UUID to be supplied for the call to be accepted – it doesn’t even matter if the UUID refers to a real health check or not. So I’m able to apply the following Ingress, and external-dns will apply the Route53 record:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/alias: "true"
    external-dns.alpha.kubernetes.io/aws-failover: PRIMARY
    external-dns.alpha.kubernetes.io/aws-health-check-id: 00000000-0000-0000-0000-000000000000
    external-dns.alpha.kubernetes.io/set-identifier: usw2

What’s important, is for aws-health-check-id to be a valid UUID. The same call will fail if the ID is not a UUID. e.g.:

metadata:
  annotations:
    external-dns.alpha.kubernetes.io/aws-health-check-id: this-will-fail

Hope this helps!

don’t close a bug that isn’t fixed

Please reopen this issue as it’s not solved. The workaround above works and right now I’m forced to use it but it’s dirty and not the right way to proceed.

In my case it worked with these annotations. Please note i am using istio Gateway and the health-check is managed outside of external-dns

Failover Primary. I prefer to use external-dns.alpha.kubernetes.io/hostname annotation just in case there are more than one hosts defined for the GW

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/alias: "true"
    external-dns.alpha.kubernetes.io/aws-failover: PRIMARY
    external-dns.alpha.kubernetes.io/aws-health-check-id: 0919762a-c10d-4a89-9dad-16f7638599cd
    external-dns.alpha.kubernetes.io/hostname: myapp.example.com
    external-dns.alpha.kubernetes.io/set-identifier: myapp-nonprod
  labels:
    app.kubernetes.io/name: myapp
  name: myapp
  namespace: test
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - myapp.example.com
    port:
      name: http-80
      number: 80
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - myapp.example.com
    port:
      name: https-443
      number: 443
      protocol: HTTPS
    tls:
      credentialName: wildcard-nonprod-example-com-crt-secret
      mode: SIMPLE

Failover Secondary, running in another cluster, has a similar definition:

  • health-check is removed to create Active-Passive failover.
  • the set-identifier is different to avoid name collision in the same zone
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/alias: "true"
    external-dns.alpha.kubernetes.io/aws-failover: SECONDARY
    external-dns.alpha.kubernetes.io/hostname: myapp.example.com
    external-dns.alpha.kubernetes.io/set-identifier: myapp-nonprod-sec
  labels:
    app.kubernetes.io/name: myapp
  name: myapp
  namespace: test
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - myapp.example.com
    port:
      name: http-80
      number: 80
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - myapp.example.com
    port:
      name: https-443
      number: 443
      protocol: HTTPS
    tls:
      credentialName: wildcard-nonprod-example-com-crt-secret
      mode: SIMPLE

I think similar annotations could be applied to ingress and Services. I hope this helped.

have anyone tried out the new annotation health-check-id, released in external-dns:v0.7.4? by associating failover TXT records with health-check-id, it should not complain anymore that it cannot create the record.

I have created a Route 53 Health Check and specified its id into annotation on ingress:

  • external-dns.alpha.kubernetes.io/health-check-id: “health-check-id” but it looks like it ignoring it and fallback to the same TXT error

@gonzalobarbitta sure; due to this bug, I wasn’t able to have external-dns create the failover records directly. Instead, it creates standard A (as alias) records for each service, but with the region in the hostname. Then, on top of that, I have failover records that I manage separately that point to these regional records. external-dns is not in this flow. I still get the benefits of failover here, but it requires additional setup. If that doesn’t answer your question, let me know and I’ll try to elaborate.

Still a bug, I tested with 0.7.3.

time="2020-09-08T11:37:48Z" level=error msg="InvalidChangeBatch: [A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made.]\n\tstatus code: 400, request id: 217f681f-5edd-4389-8150-c7c6bce67833" time="2020-09-08T11:37:48Z" level=error msg="failed to submit all changes for the following zones:

were you able to solve this? i’m running into the same issue