external-dns: AWS API InvalidClientTokenId: The security token included in the request is invalid

I am trying to bring External-DNS up with the stable helm chart. I have tried both creating and not creating the RBAC service account. In any case, it seems that the pod is not able to communicate with the K8S API.

Entered the pod and the token does look fine. In fact if I extract the token from the running pod and add it into my kubeconfig, then I am able to execute kubectl commands to see pods and services.

time="2018-05-22T20:55:09Z" level=info msg="config: {Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false Compatibility: PublishInternal:false Provider:aws GoogleProject: DomainFilter:[] ZoneIDFilter:[] AWSZoneType:public AWSAssumeRole: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug}"
time="2018-05-22T20:55:09Z" level=info msg="Connected to cluster at https://172.20.0.1:443"
time="2018-05-22T20:55:15Z" level=error msg="InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: 6d439029-5e02-11e8-b984-c1a86fb37b37"
time="2018-05-22T20:56:25Z" level=error msg="InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: 9749c5b3-5e02-11e8-9a05-e706f97ff7c1"
time="2018-05-22T20:57:26Z" level=error msg="InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: bb533cb9-5e02-11e8-a805-33dea8fc8b7d"
time="2018-05-22T20:58:26Z" level=error msg="InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: df5adf50-5e02-11e8-ae9c-9dffa2727d2c"
time="2018-05-22T20:59:31Z" level=error msg="InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: 064ea3f5-5e03-11e8-9a05-e706f97ff7c1"

When looking at it from the API side, I am seeing errors with the certificate. Yet the certificate in the pod is the same certificate that is used elsewhere and in other pods.

I0522 21:33:24.241320       1 logs.go:49] http: TLS handshake error from 10.20.0.109:50626: remote error: tls: bad certificate
I0522 21:33:24.269081       1 logs.go:49] http: TLS handshake error from 10.20.0.109:50627: remote error: tls: bad certificate
I0522 21:33:26.414586       1 logs.go:49] http: TLS handshake error from 10.20.0.109:50628: remote error: tls: bad certificate
I0522 21:33:26.441808       1 logs.go:49] http: TLS handshake error from 10.20.0.109:50629: remote error: tls: bad certificate
I0522 21:33:26.469308       1 logs.go:49] http: TLS handshake error from 10.20.0.109:50630: remote error: tls: bad certificate
I0522 21:33:26.497100       1 logs.go:49] http: TLS handshake error from 10.20.0.109:50631: remote error: tls: bad certificate

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @hickey, I had faced the same error as you. But later I figured out, I made the mistake when putting the AWS credential, that’s reason for InvalidTokenError

The values in Helm is in reversed order. secretKey appear first (but I put the ACCESS KEY ID instead)

I think you can check if you face the same issue.

Just for fun, I started up External-DNS in another cluster (v. 1.9.6) using the inmemory provider and got results much more consistent with what I would expect:

0:0 ᐅ kubectl -n kube-system logs intent-mule-external-dns-6c9fccd6bc-s2hdp -f                                                                                                                        [18:33]
time="2018-05-25T01:33:55Z" level=info msg="config: {Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false Compatibility: PublishInternal:false Provider:inmemory GoogleProject: DomainFilter:[pipeline.smartsheet.com] ZoneIDFilter:[] AWSZoneType:public AWSAssumeRole: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: Policy:upsert-only Registry:txt TXTOwnerID:prod-pipeline TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug}"
time="2018-05-25T01:33:55Z" level=info msg="Connected to cluster at https://172.20.0.1:443"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service default/conftest"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service default/kubernetes"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service kube-system/calico-etcd"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service kube-system/intent-mule-external-dns"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service kube-system/kibana-logging-cluster"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service kube-system/kube-dns"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service kube-system/kubelet"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service kube-system/tiller-deploy"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service kube-system/wintering-blackbird-kubernetes-dashboard"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service rook/rook-ceph-mgr"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service rook/rook-ceph-mon0"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service rook/rook-ceph-mon2"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from service rook/rook-ceph-mon3"
time="2018-05-25T01:33:55Z" level=debug msg="No endpoints could be generated from ingress kube-system/kibana-logging-cluster"

I then re-deployed External-DNS with the AWS provider (even though it was an on prem cluster) and got similar results of an invalid token. So given this latest data, I retract my thoughts that this is a problem with communicating with the Kubernetes API server.

I also expanded my filter for the tcpdump to only look at packets going to port 443 (feasible since the cluster in question has virtually no SSL traffic running across it) and found that there are packets going to IPs similar to the IP for route53.amazonaws.com but not the same IP that is registered in DNS. So that would also tend to confirm that the Route53 API is being contacted, but how the IP is being determined is beyond me at the moment.

I will try to spend a bit of time tomorrow digging into the AWS provider code to see what can be done to provide better logging (and debugging) into the code. Maybe from that it will become more apparent why the failures are being seen.

Thanks @chrisduong . You are right, and I think this is quite confusing and easy to make mistake. So it is better to add some comments for those two fields, or just use aws_access_key_id and aws_secret_access_key instead of accessKey and secretKey.

I have put some initial additional logging into the AWS provider code and can not find how or where the settings from the chart values.yaml file get applied to the provider–specifically the AWS key and secret key. From what I can tell, these values are never used by the code.

I have not been able to confirm this yet, but if I start external-dns without specifying the keys in the values.yaml file it seems that everything works as expected. I have an environment that I need to rebuild and will try an experiment by specifying the keys again and see if I have external-dns fail to create Route53 entries again.

If I am successful in the experiment and can show that specifying the keys do prevent external-dns from operating (which I don’t understand if they are really not being used by the code), then I would recommend that the entries be removed from the helm chart until code is written to handle them properly.

Hopefully more results soon to confirm or deny my initial findings.