external-dns: Unable to use IAM Service Account on GKE

I’m seeing the following in the logs for my external-dns pod on GKE:

time="2018-03-29T00:57:30Z" level=info msg="config: &{Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: Compatibility: PublishInternal:false Provider:google GoogleProject:MY-PROJECT DomainFilter:[MY.MANAGED.ZONE] AWSZoneType: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InMemoryZones:[] Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix:external-dns Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug}" 
time="2018-03-29T00:57:30Z" level=info msg="Connected to cluster at https://10.55.240.1:443" 
time="2018-03-29T00:57:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 
time="2018-03-29T00:58:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 
time="2018-03-29T00:59:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 
time="2018-03-29T01:00:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 

I created an issue for this in the charts repo, but after doing so, I thought it might be better to raise the issue here. Feel free to close this one if it’s the wrong place, or let me know if it’s the right place and I’ll close the other ticket.

Helm version: "v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844"

Kubernetes version: 1.9.4-gke.1

Installed via: Helm chart stable/external-dns

Args: (as passed through the helm chart to the container, obtained via kubectl describe pod foo)

      --log-level=debug
      --domain-filter=MY.MANAGED.ZONE
      --policy=upsert-only
      --provider=google
      --txt-prefix=external-dns
      --source=service
      --source=ingress
      --registry=txt
      --google-project=MY-PROJECT

Credentials: Set via env var: GOOGLE_APPLICATION_CREDENTIALS: /etc/secrets/service-account/credentials.json Credentials obtained from downloaded JSON file while creating a service account in Cloud Console Web UI.

Docker image: registry.opensource.zalan.do/teapot/external-dns:v0.4.8

Helm command:

helm upgrade\
 --install\
 --recreate-pods\
 --namespace=kube-system\
 --set domainFilters[0]="MY.MANAGED.ZONE"\
 --set extraArgs.registry=txt\
 --set logLevel=debug\
 --set provider=google\
 --set google.serviceAccountSecret=external-dns\
 --set google.project=MY-PROJECT\
 --set txtPrefix="external-dns"\
 external-dns stable/external-dns

I am able to kubectl exec my way onto the pod and verify that the file /etc/secrets/service-account/credentials.json file is in place. In troubleshooting, I’ve granted the service account full owner permissions across the entire project, and it doesn’t seem to have had any effect.

Steps to repro, as best as I can figure:

  • Create a GCP project
  • Create a managed zone in Cloud DNS (not sure if this part is strictly necessary to trigger the behavior)
  • Create a GKE cluster (K8S version: 1.9.4-gke.1)
  • Create a GKE node pool (K8S version: 1.9.4-gke.1)
  • Login to the cluster
  • Create an ingress which matches your managed zone (not sure if this part is strictly necessary to trigger the behavior)
  • Create an IAM service account.
  • Grant the IAM service account full project owner permissions.
  • Create a K8S secret with a data key of credentials.json and its value as the JSON object downloaded from the IAM service account creation dialog.
  • Run helm init to install tiller in the cluster.
  • Run helm repo update to get the latest version of the chart (I installed 0.5.2)
  • Run helm upgrade command I put further up in the issue.
  • Run kubectl logs -f THE_POD_NAME to see the error above.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 27 (7 by maintainers)

Most upvoted comments

Hi all,

I’m using the linked helm chart with GCE and a service account without any issue.

I’m supplying the helm chart with the following values:

provider: google

google:
  project: "projectName"
  serviceAccountSecret: "domain.tld"

rbac:
  create: true

Where domain.tld is a secret with key: credentials.json and value my downloaded service account credentials.

The container boots up, launches the external-dns binary, it has the following environment variables:

/ # xargs -0 -n 1 < /proc/1/environ 
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=external-dns-56bcd78f87-cxcnz
GOOGLE_APPLICATION_CREDENTIALS=/etc/secrets/service-account/credentials.json
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBE_DNS_PORT=udp://10.51.240.10:53
TILLER_DEPLOY_PORT_44134_TCP=tcp://10.51.240.227:44134
HEAPSTER_PORT_80_TCP_PROTO=tcp
HEAPSTER_PORT_80_TCP_PORT=80
KUBE_DNS_SERVICE_PORT_DNS=53
EXTERNAL_DNS_PORT_7979_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.51.240.1
TILLER_DEPLOY_PORT_44134_TCP_ADDR=10.51.240.227
EXTERNAL_DNS_SERVICE_HOST=10.51.244.129
TILLER_DEPLOY_SERVICE_PORT_TILLER=44134
TILLER_DEPLOY_PORT_44134_TCP_PROTO=tcp
KUBE_DNS_PORT_53_TCP_ADDR=10.51.240.10
TILLER_DEPLOY_SERVICE_PORT=44134
TILLER_DEPLOY_PORT=tcp://10.51.240.227:44134
EXTERNAL_DNS_PORT_7979_TCP_ADDR=10.51.244.129
KUBERNETES_PORT_443_TCP=tcp://10.51.240.1:443
HEAPSTER_SERVICE_PORT=80
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBE_DNS_PORT_53_UDP_PROTO=udp
KUBE_DNS_PORT_53_TCP=tcp://10.51.240.10:53
TILLER_DEPLOY_PORT_44134_TCP_PORT=44134
EXTERNAL_DNS_SERVICE_PORT=7979
EXTERNAL_DNS_PORT_7979_TCP=tcp://10.51.244.129:7979
KUBERNETES_PORT=tcp://10.51.240.1:443
KUBE_DNS_PORT_53_TCP_PORT=53
TILLER_DEPLOY_SERVICE_HOST=10.51.240.227
EXTERNAL_DNS_PORT_7979_TCP_PORT=7979
KUBE_DNS_SERVICE_HOST=10.51.240.10
KUBE_DNS_PORT_53_UDP=udp://10.51.240.10:53
KUBE_DNS_SERVICE_PORT=53
KUBE_DNS_PORT_53_UDP_PORT=53
KUBE_DNS_PORT_53_UDP_ADDR=10.51.240.10
KUBE_DNS_PORT_53_TCP_PROTO=tcp
HEAPSTER_SERVICE_HOST=10.51.252.134
HEAPSTER_PORT_80_TCP=tcp://10.51.252.134:80
HEAPSTER_PORT_80_TCP_ADDR=10.51.252.134
HEAPSTER_PORT=tcp://10.51.252.134:80
KUBE_DNS_SERVICE_PORT_DNS_TCP=53
EXTERNAL_DNS_PORT=tcp://10.51.244.129:7979
KUBERNETES_SERVICE_HOST=10.51.240.1
KUBERNETES_SERVICE_PORT=443
HOME=/root

If I cat out /etc/secrets/service-account/credentials.json I get back the credentials I submitted via the secret.

My logs state the following:

time="2018-05-14T18:56:30Z" level=info msg="config: {Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: tldbineFQDNAndAnnotation:false tldpatibility: PublishInternal:false Provider:google GoogleProject:domain DomainFilter:[] ZoneIDFilter:[] AWSZoneType: AWSAssumeRole: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info}"
time="2018-05-14T18:56:30Z" level=info msg="Connected to cluster at https://10.51.240.1:443"
time="2018-05-14T18:56:31Z" level=info msg="Change zone: domain-tld"
time="2018-05-14T18:56:31Z" level=info msg="Add records: docker.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: nexus.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:57:31Z" level=info msg="All records are already up to date"
time="2018-05-14T18:58:32Z" level=info msg="All records are already up to date"
time="2018-05-14T18:59:32Z" level=info msg="Change zone: domain-tld"
time="2018-05-14T18:59:32Z" level=info msg="Del records: docker.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: nexus.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: docker.domain.tld. A [35.195.152.45] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: nexus.domain.tld. A [35.195.152.45] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T19:00:33Z" level=info msg="All records are already up to date"
time="2018-05-14T19:01:34Z" level=info msg="All records are already up to date"
time="2018-05-14T19:02:34Z" level=info msg="All records are already up to date"
time="2018-05-14T19:03:34Z" level=info msg="All records are already up to date"

After wasting a number of hours on this issue I managed to get it working without any overly generous permissions.

  1. GKE cluster should be created with workload identity enabled. When using terraform we should use terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster with workload-identity
module "gke" {
  source     = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  project_id = module.variables.project_id
....
identity_namespace         = "${module.variables.project_id}.svc.id.goog"
}

module "my-workload-identity" {
  source              = "terraform-google-modules/kubernetes-engine/google//modules/workload-identity"
  name                = "${module.variables.name_prefix}-app-${terraform.workspace}"
  namespace           = "default"
  project_id          = module.variables.project_id
  use_existing_k8s_sa = false
}

  1. Oauth scopes should include the following
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/ndev.clouddns.readwrite",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
  1. Install external-dns based on the RBAC manifest here

  2. Add policy binding and service annotation

gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:project_id.svc.id.goog[default/external-dns]" \
  gsa_id@project_id.iam.gserviceaccount.com

kubectl annotate serviceaccount \
  --namespace default \
  external-dns \
  iam.gke.io/gcp-service-account=gsa_id@project_id.iam.gserviceaccount.com

# Test
kubectl run --rm -it \
  --generator=run-pod/v1 \
  --image google/cloud-sdk:slim \
  --serviceaccount external-dns \
  --namespace default \
  workload-identity-test

#gcloud auth list from inside the docker should print the service account
  1. Wait for the service token to refresh. external-dns will continue to report authentication error for few minutes before becoming fully functional

Note:

If your dns belongs to a different project, then manually create the service account in the other project and assign DNS administrator access.

I may have more information pertaining to this issue.

I am able to use cloud-dns secret key linked to an serviceAccount with --role roles/dns.admin permission configured on external-dns and cert-manager in us-east1. 👍

However, I am unable to do this on europe-west1 for example and the node scope role "https://www.googleapis.com/auth/ndev.clouddns.readwrite" is necessary to make it work on the region otherwise with the credentials.json secret set just like in the us-east1 region I get the googleapi: Error 403: Forbidden, forbidden"

Just wanted to chime into the thread because it helped me and if someone sees this, don’t worry you are not going crazy. This is something that the Google Cloud engineers have either added for extra security or the cloud-dns API authorization is currently broken in certain regions.

This seems to be a problem with IAM roles and permission in GKE/Google Cloud for certain regions. 👈 💔

Thanks for the clear steps @prabhu, I had trouble running this in a different namespace other than default and workload identity. I did the steps you mentioned from scratch and it worked as expected. Not sure if there may be some hard coded stuff regarding the token in this image: registry.opensource.zalan.do/teapot/external-dns:latest

Without using helm, I encountered this Error 403: Forbidden, forbidden error on GKE 1.10.5-gke.0 if my IAM service account wasn’t set up correctly with the project role.

To reproduce:

# create a new IAM service account
$ gcloud iam service-accounts create gke-external-dns --display-name "Service account for ExternalDNS on GKE"

# create a new node pool to use the gke-external-dns service account
$ gcloud container node-pools create external-dns-pool --cluster=main --num-nodes=1 --service-account='gke-external-dns@<project_id>.iam.gserviceaccount.com'

# create the service account's key as secret. credentials.json is download from the gcp console
$ kubectl create secret generic external-dns-key --from-file=credentials.json

If I deployed external-dns now, I see these errors in the log:

time="2018-07-26T03:09:08Z" level=info msg="Connected to cluster at https://10.100.0.1:443"
time="2018-07-26T03:09:08Z" level=error msg="Get https://www.googleapis.com/dns/v1/projects/isim-default/managedZones?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: dial tcp 74.125.195.84:443: connect: connection refused"
time="2018-07-26T03:10:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:11:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:12:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"

Assign the project owner role:

$ gcloud projects add-iam-policy-binding <project_id> --member='serviceAccount:gke-external-dns@<project_id>.iam.gserviceaccount.com' --role='roles/owner'

Now it works:

time="2018-07-26T03:09:08Z" level=info msg="Connected to cluster at https://10.100.0.1:443"
time="2018-07-26T03:09:08Z" level=error msg="Get https://www.googleapis.com/dns/v1/projects/isim-default/managedZones?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: dial tcp 74.125.195.84:443: connect: connection refused"
time="2018-07-26T03:10:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:11:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:12:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:13:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:14:08Z" level=info msg="All records are already up to date"
time="2018-07-26T03:15:09Z" level=info msg="All records are already up to date"
time="2018-07-26T03:16:08Z" level=info msg="All records are already up to date"

PS I also made the mistake of using gcloud iam service-account add-iam-policy-binding which treats the service account as a resource, not an identity.

Any updates on this issue? We’re also seeing the same 403 error when deploying ExternalDNS to a GKE cluster (following the guide above, we’re not using Helm).