external-dns: external-dns fails with "failed to sync cache: timed out waiting for the condition"

We are facing the situation that external-dns is not working at all. We are runnign it as a pod in our openshift 3.11 cluster. The pod starts up, but fails after 60 seconds with

time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

Version: kubernetes 1.11.0 external-dns 0.5.12 Configuration:

- --source=service
- --provider=pdns
- --pdns-server=http://192.168.128.15:8081/api
- --pdns-api-key=xxx
- --txt-owner-id=external-dns
- --log-level=debug
- --interval=30s

It doesn’t matter which dns-provider is configured, external–dns dies before working on zones.

The complete log looks like this

time="2019-04-03T13:19:28Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[service] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:pdns GoogleProject: DomainFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://192.168.128.15:8081/api PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:okddev01 TXTPrefix: Interval:30s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false}"
time="2019-04-03T13:19:28Z" level=info msg="Created Kubernetes client https://10.127.0.1:443"
time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

We configured a serviceacount, added the required role and rolebinding, the pod is running as the configured serviceaccount.

When running the pod with the default service-account we get the same error-message.

I tried out some other controller pods that use k8s informers, those are working without problems.

Any help would be appreciated

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 21
  • Comments: 50 (7 by maintainers)

Commits related to this issue

Most upvoted comments

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message “failed to sync cache: timed out waiting for the condition”. It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

Just spent a few hours, until I saw this. The RBAC clusterRole binding from the incubator documentation explicitly binds to the default namespace. Be wary if you try to deploy the external-dns to a namespace other than default.

Ok, problem solved. It tunred out that i messed up the clusterrole-binding. As soon as i got it right everything woked as expected. So maybe it would help to hint at a possible RBAC problem in th error-message ?

I had the same issue and I didnt enable RBAC external-dns. After I did, it worked.

I am using the helm chart: https://github.com/helm/charts/tree/master/stable/external-dns

The option is: rbac.create = true

The second reply saved me many hours of head scratching, many thanks 👍

I had similar problem trying to create the RBAC resources in a namespace other than “default”. Is this by design - or is something incorrect in my configuration?

Getting this since upgrading to k8s v1.22

I have verified and validated ClusterRole, ClusterRoleBinding, ServiceAccount, and that the Pod is using the correct ServiceAccount. I assume this has to do with Ingress and others moving out of “beta”, and will need to have the Go client updated to 0.22, as well as checking k8s version to determine which client method to use (v1beta1/v1 probably here as well as the rest of the types for Ingress v1beta1.Ingress).

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - extensions
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: cluster-components
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
  name: external-dns
  namespace: cluster-components
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "",
          "ips": [
              "10.36.0.17"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "",
          "ips": [
              "10.36.0.17"
          ],
          "default": true,
          "dns": {}
      }]
  creationTimestamp: "2021-08-10T02:50:00Z"
  generateName: external-dns-5f99cdfd7d-
  labels:
    app: external-dns
    pod-template-hash: 5f99cdfd7d
  name: external-dns-5f99cdfd7d-krsxc
  namespace: cluster-components
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: external-dns-5f99cdfd7d
    uid: 962fbdbe-a41e-4cce-ad01-662acb0a053a
  resourceVersion: "273220692"
  uid: dc4144b4-ecdd-405f-a7ec-942fd409deca
spec:
  containers:
  - args:
    - --provider=rfc2136
    - --rfc2136-host=10.0.0.2
    - --rfc2136-port=53
    - --rfc2136-zone=k8s.example.org
    - --rfc2136-tsig-secret=96Ah/a2g0/nLeFGK+d/0tzQcccf9hCEIy34PoXX2Qg8=
    - --rfc2136-tsig-secret-alg=hmac-sha256
    - --rfc2136-tsig-keyname=externaldns-key
    - --rfc2136-tsig-axfr
    - --source=service
    - --source=ingress
    - --domain-filter=k8s.example.org
    image: registry.opensource.zalan.do/teapot/external-dns:v0.7.6
    imagePullPolicy: IfNotPresent
    name: external-dns
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-qjps5
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: k8s-worker01
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: external-dns
  serviceAccountName: external-dns
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-qjps5
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T02:50:00Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T03:39:59Z"
    message: 'containers with unready status: [external-dns]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T03:39:59Z"
    message: 'containers with unready status: [external-dns]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-08-10T02:50:00Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://b4a8b4204e2654efcdaf2da21a76a766c46467c3f64120560bd10cf5c73061a1
    image: registry.opensource.zalan.do/teapot/external-dns:v0.7.6
    imageID: docker-pullable://registry.opensource.zalan.do/teapot/external-dns@sha256:30b83b9469ed6047c34666b0184991b88e5a83b122cc0899841abe014fad3a19
    lastState:
      terminated:
        containerID: docker://b4a8b4204e2654efcdaf2da21a76a766c46467c3f64120560bd10cf5c73061a1
        exitCode: 1
        finishedAt: "2021-08-10T03:39:58Z"
        reason: Error
        startedAt: "2021-08-10T03:38:57Z"
    name: external-dns
    ready: false
    restartCount: 12
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=external-dns pod=external-dns-5f99cdfd7d-krsxc_kube-system(dc4144b4-ecdd-405f-a7ec-942fd409deca)
        reason: CrashLoopBackOff
  hostIP: 10.3.0.41
  phase: Running
  podIP: 10.36.0.17
  podIPs:
  - ip: 10.36.0.17
  qosClass: BestEffort
  startTime: "2021-08-10T02:50:00Z"
time="2021-08-10T03:45:03Z" level=info msg="Instantiating new Kubernetes client"
time="2021-08-10T03:45:03Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2021-08-10T03:45:03Z" level=info msg="Created Kubernetes client https://10.96.0.1:443"
time="2021-08-10T03:46:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

I had similar problem trying to create the RBAC resources in a namespace other than “default”. Is this by design - or is something incorrect in my configuration?

@apigeeks-lee You’re probably referencing the wrong service account in your role binding. Double-check the subject’s name and namespace.

I had the same issue and resolved by upgrading the bitnami chart version.

I’m using lablabs terraform provider: https://registry.terraform.io/modules/lablabs/eks-external-dns/aws/latest and had the same issue upgrading to 1.22. Everything appeared to be ok config-wise, I can only assume it may have been somehow related to old beta versions being removed as jslay88 suggested. It works now so not going to spend more time digging.

My fix was to update the helm_chart_version to the latest version "6.2.4" (default was "5.4.4") https://github.com/lablabs/terraform-aws-eks-external-dns/blob/master/variables.tf

Final main.tf

module "eks-external-dns" {
  source  = "lablabs/eks-external-dns/aws"
  version = "0.9.0"

  helm_chart_version = "6.2.4"

  cluster_identity_oidc_issuer     = var.k8s_eks_cluster_oidc_issuer_url
  cluster_identity_oidc_issuer_arn = var.k8s_eks_oidc_provider_arn

  policy_allowed_zone_ids = ["${var.hosted_zone_id}"]

  tags = var.tags # v0.9.0 only

  values = yamlencode({
    # ensure any deleted entries are also synced
    "policy" : "sync" 
    
    # required for AWS EKS: 
    # https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/aws.md
    "podSecurityContext": {
      "fsGroup" : 65534
    }
  })
}

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message “failed to sync cache: timed out waiting for the condition”. It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

Works like a charm! thanks @GeertJohan

In my case, I had copied the example pdns provider configuration from the documentation, resolved the namespaces issues, and applied to a v1.22 Kubernetes cluster. The pdns example hardcodes the container version to 0.7.6, which per the External DNS documentation is not compatible with Kubernetes 1.22. I updated the container version to 0.10.0 and it started working.

Same, here, external-dns broke on upgrade to 1.22

We follow the rfc2136 docs and found this is missing from its RBAC section:

- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list","watch"]

Once I added that, external-dns > 0.7.1 started working again without the “failed to sync cache” error.

In my case, I had copied the example pdns provider configuration from the documentation, resolved the namespaces issues, and applied to a v1.22 Kubernetes cluster. The pdns example hardcodes the container version to 0.7.6, which per the External DNS documentation is not compatible with Kubernetes 1.22. I updated the container version to 0.10.0 and it started working.

Similarly, I landed here after updating my cluster and suddenly my existing external-dns installations don’t work anymore 😃

The solution for me was to update to 0.10.0 as well.

Solved the following issue by removing not currently deployed: –source Mainly in automation I appended that Istio will be used, but current Kubernetes installation doesn’t have it, so external-dns failed… will have to update helm chart to append if statement if istio is already installed to add additional sources…

In my case, the issue is related to namespace. I have a ray-prod name space on which all things installed but in defined yml that ClusterRoleBinding has default namespace. I have changed default to ray-prod

When deploying ExternalDNS via the Helm Chart by Bitnami, I always get the timed out waiting for the condition error message with all the solutions posted above, even when I make the service account a cluster-admin.

Deploying it as described in the tutorial solved this issue, so there must be a delta in the Helm chart that causes this issue. I’m investigating it.

This just bit me as well, one thing to check is the clusterRoleBinding in the documentation binds to a service account in the default namespace so if you want to run external-dns in a different namespace make sure you change the namespace from default to your new namespace before creating the clusterRoleBinding

I confirm that it’s not necesarlly the RBAC issue. I upgraded my kubernetes (AKS) to 1.22.6 from 1.21.9 and it suddently starting failing with this error. I had an external dns version 0.8 which I upgraded to 0.11 and it worked perfectly.

I also had another external dns deployment (for external registritation) which was at 0.10.2 and it didn’t get affected by the cluster upgrade.

Turns out in my case I had the following:

# https://artifacthub.io/packages/helm/bitnami/external-dns
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: external-dns
  namespace: argocd
spec:
  destination:
    namespace: kube-system
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: external-dns
    helm:
      parameters:
      - name: aws.region
        value: us-east-2
      - name: domainFilters[0]
        value: mydomain.com
      - name: serviceAccount.create
        value: 'true'
      - name: rbac.create
        value: 'true'
      - name: resources.limits.cpu
        value: 100m
      - name: resources.limits.memory
        value: 300Mi
      - name: resources.requests.cpu
        value: 100m
      - name: resources.requests.memory
        value: 300Mi
      - name: sources[0]
        value: service
      - name: sources[1]
        value: ingress
      - name: sources[2]
        value: istio-gateway
      - name: sources[3]
        value: istio-virtualservice
      - name: serviceAccount.annotations.eks\.amazonaws\.com/role-arn
        value: arn:aws:sts::122803911111:assumed-role/eks-cluster-kubeflow-test-irsa
    repoURL: https://charts.bitnami.com/bitnami
    targetRevision: 5.1.1
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

in particular those two lines:

      - name: sources[2]
        value: istio-gateway
      - name: sources[3]
        value: istio-virtualservice

But since istio-operator installation was failing silently (I had to look at the logs) then the crds for istio-gateway and istio-virtualservice were not deployed, which in turn resulted in the error "failed to sync cache: timed out waiting for the condition"). It would have helped to get a more explicit error…

In case you use Sources:[istio-virtualservice istio-gateway] and you don’t have Istio CRDs on cluster yet, it will error with:

failed to sync cache: timed out waiting for the condition

@SamMousa Is there any way by we can have do this without creating cluster role? In our case we don’t have permission to create cluster roles. I see something here but this doesn’t work with AWS

I have resolved the issue by adding a clusterrole + binding to the service account, note the hardcoded names

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-node-watcher
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-watcher-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-node-watcher
subjects:
  - kind: ServiceAccount
    name: external-dns
    namespace: staging

I compared the cluster role and the normal role and noted that the only difference is the nodes resource. This resource is only available at the cluster level.

@GeertJohan You are the real MVP.

@GeertJohan awesome man! Thank you!

@GeertJohan awesome, probably just saved me a couples minutes / hours ! xD

@GeertJohan, thanks It’s saved my time