external-dns: failed to sync v1.Ingress: context deadline exceeded

What happened: I tried installing the controller on my cluster but it keeps failing with this error:

time="2021-11-01T10:55:49Z" level=info msg="Instantiating new Kubernetes client"
time="2021-11-01T10:55:49Z" level=debug msg="apiServerURL: "
time="2021-11-01T10:55:49Z" level=debug msg="kubeConfig: "
time="2021-11-01T10:55:49Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2021-11-01T10:55:49Z" level=info msg="Created Kubernetes client https://172.20.0.1:443"
time="2021-11-01T10:56:49Z" level=fatal msg="failed to sync *v1.Ingress: context deadline exceeded"

What you expected to happen: For it to work

How to reproduce it (as minimally and precisely as possible): Those are the flags I used to run:

 args:
        - --log-level=trace
        - --log-format=text
        - --interval=3m
        - --events
        - --source=service
        - --source=ingress
        - --policy=upsert-only
        - --registry=txt
        - --txt-owner-id=cl-dev
        - --provider=aws

Anything else we need to know?: I made sure that there’s a service account with clusterrolebinding of cluster-admin (to eliminate the possibility of RBAC issues), and that the token is mounted, also ran:

$ for i in get list watch;do for l in nodes ingresses services;do kubectl auth can-i $i $l --as=system:serviceaccount:kube-system:external-dns -n kube-system;done;done
Warning: resource 'nodes' is not namespace scoped
yes
yes
yes
Warning: resource 'nodes' is not namespace scoped
yes
yes
yes
Warning: resource 'nodes' is not namespace scoped
yes
yes
yes

And the service account token is mounted:

$ kubectl get pod external-dns-bf97b7c6c-n99lr -oyaml | grep -A10 -B10 serviceAccount
  serviceAccount: external-dns
  serviceAccountName: external-dns
  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token
  - name: external-dns-token-tl4qj
    secret:
      defaultMode: 420
      secretName: external-dns-token-tl4qj

Environment: kubernetes (EKS v1.18.20-eks-8c579e)

  • External-DNS version (use external-dns --version): k8s.gcr.io/external-dns/external-dns:v0.10.1
  • DNS provider: route53
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 24
  • Comments: 30 (2 by maintainers)

Most upvoted comments

We had the same error message (using EKS v1.21.2-eks-0389ca3) and the error was gone after updating the ClusterRole definition to the following:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-dns
rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get","watch","list"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get","watch","list"]
  - apiGroups: ["networking","networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get","watch","list"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get","watch","list"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get","watch","list"]

I had the same error and turned out I had the wrong namespace default in ClusterRoleBinding whereas I had deployed external-dns in its own namespace. Changing the namespace in the ClusterRoleBinding fixed it for me.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: external-dns

I hit the issue after the cluster upgrade to 1.22, However, after checking the logs it says

  1. failed to sync *v1.Endpoints: context deadline exceeded
  2. failed to sync *v1.Ingress: context deadline exceeded" After that, I updated cluster role with new permission.(add listing of Endpoints and Ingress) and issue get resolved. Please check the logs and you will get possible reasons.
  • apiGroups: [“networking”,“networking.k8s.io”] resources: [“ingresses”] verbs: [“get”,“watch”,“list”]

  • apiGroups: [“”] resources: [“endpoints”] verbs: [“get”,“watch”,“list”]

I also got an error message which was caused by faulty RBAC:

time="2022-03-14T15:42:13Z" level=info msg="Created Kubernetes client https://10.238.0.1:443"
time="2022-03-14T15:43:13Z" level=fatal msg="failed to sync *v1.Endpoints: context deadline exceeded"

It would be nice if the error message could state that the request of the client is answered by something like “access denied”, instead of this timeout message after 60 seconds.

I get the same errors with Kubernetes cluster v 1.17.9 and External -DNS v0.10.1 and RFC2136 provider. All ingresses in my cluster uses beta api:

kind: Ingress
apiVersion: networking.k8s.io/v1beta1

Is this an issue? Btw the previous version 0.9.0 of External-DNS works well.

Same issue with 0.11.0 on Digital Ocean Kubernetes 1.22.8

This doesn’t seem to be #2168 for me; it was the RBAC issue described above. I’m installing using the Helm chart. I had sources set to [istio-gateway,istio-virtualservice]. This causes https://github.com/kubernetes-sigs/external-dns/blob/master/charts/external-dns/templates/clusterrole.yaml to only add permissions for those two types. Adding service to sources causes the Chart to also grant access to Nodes, Pods, Services, Endpoints, and then this error goes away.

I’m no external-dns expert but some thoughts

  • Weird this manifests as a timeout not a 401 - is some abstraction layer in the libraries retrying on 401?
  • I don’t use the headless service source, and I’m not sure why external-dns would need to read Services etc otherwise, but I’m sure (much) older versions of external-dns did need all these extra permissions so maybe they legitimately still do? Or maybe there’s a logic error trying to read Services even when it’s not necessary?