external-dns: external-dns fails with "failed to sync cache: timed out waiting for the condition"
We are facing the situation that external-dns is not working at all. We are runnign it as a pod in our openshift 3.11 cluster. The pod starts up, but fails after 60 seconds with
time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
Version: kubernetes 1.11.0 external-dns 0.5.12 Configuration:
- --source=service
- --provider=pdns
- --pdns-server=http://192.168.128.15:8081/api
- --pdns-api-key=xxx
- --txt-owner-id=external-dns
- --log-level=debug
- --interval=30s
It doesn’t matter which dns-provider is configured, external–dns dies before working on zones.
The complete log looks like this
time="2019-04-03T13:19:28Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[service] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:pdns GoogleProject: DomainFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://192.168.128.15:8081/api PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:okddev01 TXTPrefix: Interval:30s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false}"
time="2019-04-03T13:19:28Z" level=info msg="Created Kubernetes client https://10.127.0.1:443"
time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
We configured a serviceacount, added the required role and rolebinding, the pod is running as the configured serviceaccount.
When running the pod with the default service-account we get the same error-message.
I tried out some other controller pods that use k8s informers, those are working without problems.
Any help would be appreciated
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 21
- Comments: 50 (7 by maintainers)
Commits related to this issue
- Fix OVH tutorial to match new permissions External DNS now require permissions on endpoints resource. Adding it in the OVH tutorial manifest following this comment (https://github.com/kubernetes-sigs... — committed to alistarle/external-dns by alistarle 4 years ago
- Merge from github external-dns into release (#4) * Allow multiple services to share same dns record * NS record support * Fix NS related provider test * update comment to explain edge case b... — committed to F5Networks/external-dns by swapmat-f5 4 years ago
- Adding `endpoints` resource permission According to https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509, an `endpoints` resource permission is needed I'm using EKS 1.1... — committed to tsahiduek/aws-load-balancer-controller by tsahiduek 4 years ago
- Adding `endpoints` resource permission (#1580) According to https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509, an `endpoints` resource permission is needed I'm using... — committed to kubernetes-sigs/aws-load-balancer-controller by tsahiduek 4 years ago
- Update Readme with F5 DNS Load Balancer service * chore: fix k8s-ci-robot license check on github * fix: More linter fixes * fix: linter issues * Merge from github external-dns into release ... — committed to F5Networks/external-dns by swapmat-f5 4 years ago
- fix external dns rbac https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509 — committed to nlopez/k8s_home by deleted user 4 years ago
- Fix OVH tutorial to match new permissions External DNS now require permissions on endpoints resource. Adding it in the OVH tutorial manifest following this comment (https://github.com/kubernetes-sigs... — committed to cgroschupp/external-dns by alistarle 4 years ago
- Adding `endpoints` resource permission (#1580) According to https://github.com/kubernetes-sigs/external-dns/issues/961#issuecomment-664849509, an `endpoints` resource permission is needed I'm using... — committed to adammw/aws-load-balancer-controller by tsahiduek 4 years ago
After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message “failed to sync cache: timed out waiting for the condition”. It seems that
endpoints
were added and external-dns now requires extra permissions.Make sure you have added
to your
external-dns
ClusterRole. Adding this solved the problem for me.Just spent a few hours, until I saw this. The RBAC clusterRole binding from the incubator documentation explicitly binds to the default namespace. Be wary if you try to deploy the external-dns to a namespace other than default.
Ok, problem solved. It tunred out that i messed up the clusterrole-binding. As soon as i got it right everything woked as expected. So maybe it would help to hint at a possible RBAC problem in th error-message ?
I had the same issue and I didnt enable RBAC external-dns. After I did, it worked.
I am using the helm chart: https://github.com/helm/charts/tree/master/stable/external-dns
The option is:
rbac.create = true
The second reply saved me many hours of head scratching, many thanks 👍
I had similar problem trying to create the RBAC resources in a namespace other than “default”. Is this by design - or is something incorrect in my configuration?
Getting this since upgrading to k8s v1.22
I have verified and validated ClusterRole, ClusterRoleBinding, ServiceAccount, and that the Pod is using the correct ServiceAccount. I assume this has to do with Ingress and others moving out of “beta”, and will need to have the Go client updated to 0.22, as well as checking k8s version to determine which client method to use (v1beta1/v1 probably here as well as the rest of the types for Ingress
v1beta1.Ingress
).@apigeeks-lee You’re probably referencing the wrong service account in your role binding. Double-check the
subject
’s name and namespace.I had the same issue and resolved by upgrading the bitnami chart version.
I’m using lablabs terraform provider: https://registry.terraform.io/modules/lablabs/eks-external-dns/aws/latest and had the same issue upgrading to 1.22. Everything appeared to be ok config-wise, I can only assume it may have been somehow related to old beta versions being removed as jslay88 suggested. It works now so not going to spend more time digging.
My fix was to update the
helm_chart_version
to the latest version"6.2.4"
(default was"5.4.4"
) https://github.com/lablabs/terraform-aws-eks-external-dns/blob/master/variables.tfFinal main.tf
Works like a charm! thanks @GeertJohan
In my case, I had copied the example pdns provider configuration from the documentation, resolved the namespaces issues, and applied to a v1.22 Kubernetes cluster. The pdns example hardcodes the container version to 0.7.6, which per the External DNS documentation is not compatible with Kubernetes 1.22. I updated the container version to 0.10.0 and it started working.
Same, here, external-dns broke on upgrade to 1.22
We follow the rfc2136 docs and found this is missing from its RBAC section:
Once I added that, external-dns > 0.7.1 started working again without the “failed to sync cache” error.
Similarly, I landed here after updating my cluster and suddenly my existing external-dns installations don’t work anymore 😃
The solution for me was to update to 0.10.0 as well.
Solved the following issue by removing not currently deployed: –source Mainly in automation I appended that Istio will be used, but current Kubernetes installation doesn’t have it, so external-dns failed… will have to update helm chart to append if statement if istio is already installed to add additional sources…
In my case, the issue is related to namespace. I have a ray-prod name space on which all things installed but in defined yml that
ClusterRoleBinding
hasdefault
namespace. I have changeddefault
toray-prod
When deploying ExternalDNS via the Helm Chart by Bitnami, I always get the
timed out waiting for the condition
error message with all the solutions posted above, even when I make the service account a cluster-admin.Deploying it as described in the tutorial solved this issue, so there must be a delta in the Helm chart that causes this issue. I’m investigating it.
This just bit me as well, one thing to check is the clusterRoleBinding in the documentation binds to a service account in the default namespace so if you want to run external-dns in a different namespace make sure you change the namespace from default to your new namespace before creating the clusterRoleBinding
I confirm that it’s not necesarlly the RBAC issue. I upgraded my kubernetes (AKS) to 1.22.6 from 1.21.9 and it suddently starting failing with this error. I had an external dns version 0.8 which I upgraded to 0.11 and it worked perfectly.
I also had another external dns deployment (for external registritation) which was at 0.10.2 and it didn’t get affected by the cluster upgrade.
Turns out in my case I had the following:
in particular those two lines:
But since istio-operator installation was failing silently (I had to look at the logs) then the crds for
istio-gateway
andistio-virtualservice
were not deployed, which in turn resulted in the error"failed to sync cache: timed out waiting for the condition"
). It would have helped to get a more explicit error…In case you use
Sources:[istio-virtualservice istio-gateway]
and you don’t have Istio CRDs on cluster yet, it will error with:@SamMousa Is there any way by we can have do this without creating cluster role? In our case we don’t have permission to create cluster roles. I see something here but this doesn’t work with AWS
I have resolved the issue by adding a clusterrole + binding to the service account, note the hardcoded names
I compared the cluster role and the normal role and noted that the only difference is the
nodes
resource. This resource is only available at the cluster level.@GeertJohan You are the real MVP.
@GeertJohan awesome man! Thank you!
@GeertJohan awesome, probably just saved me a couples minutes / hours ! xD
@GeertJohan, thanks It’s saved my time