external-dns: external-dns fails with "failed to sync cache: timed out waiting for the condition"

We are facing the situation that external-dns is not working at all. We are runnign it as a pod in our openshift 3.11 cluster. The pod starts up, but fails after 60 seconds with

time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

Version: kubernetes 1.11.0 external-dns 0.5.12 Configuration:

- --source=service
- --provider=pdns
- --pdns-server=http://192.168.128.15:8081/api
- --pdns-api-key=xxx
- --txt-owner-id=external-dns
- --log-level=debug
- --interval=30s

It doesn’t matter which dns-provider is configured, external–dns dies before working on zones.

The complete log looks like this

time="2019-04-03T13:19:28Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[service] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:pdns GoogleProject: DomainFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://192.168.128.15:8081/api PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:okddev01 TXTPrefix: Interval:30s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false}"
time="2019-04-03T13:19:28Z" level=info msg="Created Kubernetes client https://10.127.0.1:443"
time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

We configured a serviceacount, added the required role and rolebinding, the pod is running as the configured serviceaccount.

When running the pod with the default service-account we get the same error-message.

I tried out some other controller pods that use k8s informers, those are working without problems.

Any help would be appreciated

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 21
Comments: 50 (7 by maintainers)

Commits related to this issue

Fix OVH tutorial to match new permissions External DNS now require permissions on endpoints resource. Adding it in the OVH tutorial manifest following this comment (https://github.com/kubernetes-sigs... — committed to alistarle/external-dns by alistarle 4 years ago
Merge from github external-dns into release (#4) * Allow multiple services to share same dns record * NS record support * Fix NS related provider test * update comment to explain edge case b... — committed to F5Networks/external-dns by swapmat-f5 4 years ago
Adding `endpoints` resource permission According to https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509, an `endpoints` resource permission is needed I'm using EKS 1.1... — committed to tsahiduek/aws-load-balancer-controller by tsahiduek 4 years ago
Adding `endpoints` resource permission (#1580) According to https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509, an `endpoints` resource permission is needed I'm using... — committed to kubernetes-sigs/aws-load-balancer-controller by tsahiduek 4 years ago
Update Readme with F5 DNS Load Balancer service * chore: fix k8s-ci-robot license check on github * fix: More linter fixes * fix: linter issues * Merge from github external-dns into release ... — committed to F5Networks/external-dns by swapmat-f5 4 years ago
fix external dns rbac https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509 — committed to nlopez/k8s_home by deleted user 4 years ago
Fix OVH tutorial to match new permissions External DNS now require permissions on endpoints resource. Adding it in the OVH tutorial manifest following this comment (https://github.com/kubernetes-sigs... — committed to cgroschupp/external-dns by alistarle 4 years ago
Adding `endpoints` resource permission (#1580) According to https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509, an `endpoints` resource permission is needed I'm using... — committed to adammw/aws-load-balancer-controller by tsahiduek 4 years ago

Most upvoted comments

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message “failed to sync cache: timed out waiting for the condition”. It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

+195

GeertJohan on Jul 28, 2020

Just spent a few hours, until I saw this. The RBAC clusterRole binding from the incubator documentation explicitly binds to the default namespace. Be wary if you try to deploy the external-dns to a namespace other than default.

+72

StephanX on Sep 29, 2019

Ok, problem solved. It tunred out that i messed up the clusterrole-binding. As soon as i got it right everything woked as expected. So maybe it would help to hint at a possible RBAC problem in th error-message ?

+60

iceman91176 on Apr 3, 2019

I had the same issue and I didnt enable RBAC external-dns. After I did, it worked.

I am using the helm chart: https://github.com/helm/charts/tree/master/stable/external-dns

The option is: rbac.create = true

+17

sekka1 on May 24, 2019

The second reply saved me many hours of head scratching, many thanks 👍

+17

nitaigao on Apr 8, 2019

I had similar problem trying to create the RBAC resources in a namespace other than “default”. Is this by design - or is something incorrect in my configuration?

+13

apigeeks-lee on Apr 11, 2019

Getting this since upgrading to k8s v1.22

I have verified and validated ClusterRole, ClusterRoleBinding, ServiceAccount, and that the Pod is using the correct ServiceAccount. I assume this has to do with Ingress and others moving out of “beta”, and will need to have the Go client updated to 0.22, as well as checking k8s version to determine which client method to use (v1beta1/v1 probably here as well as the rest of the types for Ingress v1beta1.Ingress).

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - extensions
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: cluster-components

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
  name: external-dns
  namespace: cluster-components

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "",
          "ips": [
              "10.36.0.17"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "",
          "ips": [
              "10.36.0.17"
          ],
          "default": true,
          "dns": {}
      }]
  creationTimestamp: "2021-08-10T02:50:00Z"
  generateName: external-dns-5f99cdfd7d-
  labels:
    app: external-dns
    pod-template-hash: 5f99cdfd7d
  name: external-dns-5f99cdfd7d-krsxc
  namespace: cluster-components
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: external-dns-5f99cdfd7d
    uid: 962fbdbe-a41e-4cce-ad01-662acb0a053a
  resourceVersion: "273220692"
  uid: dc4144b4-ecdd-405f-a7ec-942fd409deca
spec:
  containers:
  - args:
    - --provider=rfc2136
    - --rfc2136-host=10.0.0.2
    - --rfc2136-port=53
    - --rfc2136-zone=k8s.example.org
    - --rfc2136-tsig-secret=96Ah/a2g0/nLeFGK+d/0tzQcccf9hCEIy34PoXX2Qg8=
    - --rfc2136-tsig-secret-alg=hmac-sha256
    - --rfc2136-tsig-keyname=externaldns-key
    - --rfc2136-tsig-axfr
    - --source=service
    - --source=ingress
    - --domain-filter=k8s.example.org
    image: registry.opensource.zalan.do/teapot/external-dns:v0.7.6
    imagePullPolicy: IfNotPresent
    name: external-dns
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-qjps5
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: k8s-worker01
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: external-dns
  serviceAccountName: external-dns
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-qjps5
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T02:50:00Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T03:39:59Z"
    message: 'containers with unready status: [external-dns]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T03:39:59Z"
    message: 'containers with unready status: [external-dns]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T02:50:00Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://b4a8b4204e2654efcdaf2da21a76a766c46467c3f64120560bd10cf5c73061a1
    image: registry.opensource.zalan.do/teapot/external-dns:v0.7.6
    imageID: docker-pullable://registry.opensource.zalan.do/teapot/external-dns@sha256:30b83b9469ed6047c34666b0184991b88e5a83b122cc0899841abe014fad3a19
    lastState:
      terminated:
        containerID: docker://b4a8b4204e2654efcdaf2da21a76a766c46467c3f64120560bd10cf5c73061a1
        exitCode: 1
        finishedAt: "2021-08-10T03:39:58Z"
        reason: Error
        startedAt: "2021-08-10T03:38:57Z"
    name: external-dns
    ready: false
    restartCount: 12
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=external-dns pod=external-dns-5f99cdfd7d-krsxc_kube-system(dc4144b4-ecdd-405f-a7ec-942fd409deca)
        reason: CrashLoopBackOff
  hostIP: 10.3.0.41
  phase: Running
  podIP: 10.36.0.17
  podIPs:
  - ip: 10.36.0.17
  qosClass: BestEffort
  startTime: "2021-08-10T02:50:00Z"

time="2021-08-10T03:45:03Z" level=info msg="Instantiating new Kubernetes client"
time="2021-08-10T03:45:03Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2021-08-10T03:45:03Z" level=info msg="Created Kubernetes client https://10.96.0.1:443"
time="2021-08-10T03:46:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

+11

jslay88 on Aug 10, 2021

I had similar problem trying to create the RBAC resources in a namespace other than “default”. Is this by design - or is something incorrect in my configuration?

@apigeeks-lee You’re probably referencing the wrong service account in your role binding. Double-check the subject’s name and namespace.

linki on May 28, 2019

I had the same issue and resolved by upgrading the bitnami chart version.

I’m using lablabs terraform provider: https://registry.terraform.io/modules/lablabs/eks-external-dns/aws/latest and had the same issue upgrading to 1.22. Everything appeared to be ok config-wise, I can only assume it may have been somehow related to old beta versions being removed as jslay88 suggested. It works now so not going to spend more time digging.

My fix was to update the helm_chart_version to the latest version "6.2.4" (default was "5.4.4") https://github.com/lablabs/terraform-aws-eks-external-dns/blob/master/variables.tf

Final main.tf

module "eks-external-dns" {
  source  = "lablabs/eks-external-dns/aws"
  version = "0.9.0"

  helm_chart_version = "6.2.4"

  cluster_identity_oidc_issuer     = var.k8s_eks_cluster_oidc_issuer_url
  cluster_identity_oidc_issuer_arn = var.k8s_eks_oidc_provider_arn

  policy_allowed_zone_ids = ["${var.hosted_zone_id}"]

  tags = var.tags # v0.9.0 only

  values = yamlencode({
    # ensure any deleted entries are also synced
    "policy" : "sync" 
    
    # required for AWS EKS: 
    # https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/aws.md
    "podSecurityContext": {
      "fsGroup" : 65534
    }
  })
}

joeratos on Apr 8, 2022

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message “failed to sync cache: timed out waiting for the condition”. It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]
to your external-dns ClusterRole. Adding this solved the problem for me.

Works like a charm! thanks @GeertJohan

lcontini on Aug 28, 2020

In my case, I had copied the example pdns provider configuration from the documentation, resolved the namespaces issues, and applied to a v1.22 Kubernetes cluster. The pdns example hardcodes the container version to 0.7.6, which per the External DNS documentation is not compatible with Kubernetes 1.22. I updated the container version to 0.10.0 and it started working.

kevin-fitzgerald on Jun 6, 2022

Same, here, external-dns broke on upgrade to 1.22

alexanderkjeldaas on Nov 14, 2021

We follow the rfc2136 docs and found this is missing from its RBAC section:

- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list","watch"]

Once I added that, external-dns > 0.7.1 started working again without the “failed to sync cache” error.

cprivitere on Oct 14, 2020

In my case, I had copied the example pdns provider configuration from the documentation, resolved the namespaces issues, and applied to a v1.22 Kubernetes cluster. The pdns example hardcodes the container version to 0.7.6, which per the External DNS documentation is not compatible with Kubernetes 1.22. I updated the container version to 0.10.0 and it started working.

Similarly, I landed here after updating my cluster and suddenly my existing external-dns installations don’t work anymore 😃

The solution for me was to update to 0.10.0 as well.

Starttoaster on Jun 12, 2022

Solved the following issue by removing not currently deployed: –source Mainly in automation I appended that Istio will be used, but current Kubernetes installation doesn’t have it, so external-dns failed… will have to update helm chart to append if statement if istio is already installed to add additional sources…

Maraudingas on Sep 1, 2021

In my case, the issue is related to namespace. I have a ray-prod name space on which all things installed but in defined yml that ClusterRoleBinding has default namespace. I have changed default to ray-prod

nrvmodi on Mar 31, 2021

When deploying ExternalDNS via the Helm Chart by Bitnami, I always get the timed out waiting for the condition error message with all the solutions posted above, even when I make the service account a cluster-admin.

Deploying it as described in the tutorial solved this issue, so there must be a delta in the Helm chart that causes this issue. I’m investigating it.

hendrikhalkow on Mar 25, 2021

This just bit me as well, one thing to check is the clusterRoleBinding in the documentation binds to a service account in the default namespace so if you want to run external-dns in a different namespace make sure you change the namespace from default to your new namespace before creating the clusterRoleBinding

ajgajg1134 on Aug 19, 2020

I confirm that it’s not necesarlly the RBAC issue. I upgraded my kubernetes (AKS) to 1.22.6 from 1.21.9 and it suddently starting failing with this error. I had an external dns version 0.8 which I upgraded to 0.11 and it worked perfectly.

I also had another external dns deployment (for external registritation) which was at 0.10.2 and it didn’t get affected by the cluster upgrade.

MostefaKamalLala on Jun 21, 2022

Turns out in my case I had the following:

# https://artifacthub.io/packages/helm/bitnami/external-dns
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: external-dns
  namespace: argocd
spec:
  destination:
    namespace: kube-system
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: external-dns
    helm:
      parameters:
      - name: aws.region
        value: us-east-2
      - name: domainFilters[0]
        value: mydomain.com
      - name: serviceAccount.create
        value: 'true'
      - name: rbac.create
        value: 'true'
      - name: resources.limits.cpu
        value: 100m
      - name: resources.limits.memory
        value: 300Mi
      - name: resources.requests.cpu
        value: 100m
      - name: resources.requests.memory
        value: 300Mi
      - name: sources[0]
        value: service
      - name: sources[1]
        value: ingress
      - name: sources[2]
        value: istio-gateway
      - name: sources[3]
        value: istio-virtualservice
      - name: serviceAccount.annotations.eks\.amazonaws\.com/role-arn
        value: arn:aws:sts::122803911111:assumed-role/eks-cluster-kubeflow-test-irsa
    repoURL: https://charts.bitnami.com/bitnami
    targetRevision: 5.1.1
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

in particular those two lines:

      - name: sources[2]
        value: istio-gateway
      - name: sources[3]
        value: istio-virtualservice

But since istio-operator installation was failing silently (I had to look at the logs) then the crds for istio-gateway and istio-virtualservice were not deployed, which in turn resulted in the error "failed to sync cache: timed out waiting for the condition"). It would have helped to get a more explicit error…

EKami on Jul 17, 2021

In case you use Sources:[istio-virtualservice istio-gateway] and you don’t have Istio CRDs on cluster yet, it will error with:

failed to sync cache: timed out waiting for the condition

Kyslik on Dec 8, 2021

@SamMousa Is there any way by we can have do this without creating cluster role? In our case we don’t have permission to create cluster roles. I see something here but this doesn’t work with AWS

bigbang4u2 on Mar 10, 2021

I have resolved the issue by adding a clusterrole + binding to the service account, note the hardcoded names

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-node-watcher
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-watcher-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-node-watcher
subjects:
  - kind: ServiceAccount
    name: external-dns
    namespace: staging

I compared the cluster role and the normal role and noted that the only difference is the nodes resource. This resource is only available at the cluster level.

SamMousa on Mar 5, 2021

@GeertJohan You are the real MVP.

Abhi94N on Nov 24, 2020

@GeertJohan awesome man! Thank you!

evgenyvaganov on Sep 22, 2020

@GeertJohan awesome, probably just saved me a couples minutes / hours ! xD

JnMik on Jul 30, 2020

@GeertJohan, thanks It’s saved my time

Viswa88 on Jul 29, 2020