pulumi-aws: New AWS regions like ap-east-1 cause aws.provider errors
What happened?
We have run into what seems like a showstopper of a problem, at least for us. New regions are being provisioned with session tokens v2 and this breaks our aws.Provider() with the following errors:
Error: failed to refresh cached credentials, operation error STS: AssumeRole, failed to sign request: failed to retrieve credentials:
raise invoke_error
Exception: invoke of aws:index/getCallerIdentity:getCallerIdentity failed: invocation of aws:index/getCallerIdentity:getCallerIdentity returned an error: 1 error occurred:
* error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.
I know there is a newer aws_native library which may fix this, we’re not using that. (and resources don’t seem to be cross compatible with the native provider)
Steps to reproduce
Perform any call against a region endpoint using tokens v2 like ap-east-1 using a provider with a profile using sts:assumerole
_provider = aws.Provider("Provider", region=region, profile="profile_name")
Expected Behavior
It works
Actual Behavior
Error: failed to refresh cached credentials, operation error STS: AssumeRole, failed to sign request: failed to retrieve credentials:
raise invoke_error
Exception: invoke of aws:index/getCallerIdentity:getCallerIdentity failed: invocation of aws:index/getCallerIdentity:getCallerIdentity returned an error: 1 error occurred:
* error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.
Output of pulumi about
No response
Additional context
Went thru botocore code, and this is where the SDK decides when to to use the regional or global STS endpoints: https://github.com/boto/botocore/blob/dbc23f090d1257095da8bade8cb3fd5eeaec31db/botocore/args.py#L385-L388
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you’ve opened one already).
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 5
- Comments: 18 (9 by maintainers)
@thomas11: I’m wondering if this problem is only manifesting when using an assume-role profile, instead of direct pulumi configs?
In our case, with
ap-east-1
enabled, we get the error reported by @rdanno when using a profile in~/.aws/config
similar to the one below and referencing the profile name in the pulumi code:As mentioned, the error happens in
ap-east-1
which uses a V2 STS endpoint, but it doesn’t happen in regions where V1 is used (global STS endpoint).The same error happens in much older versions of awscli/botocore. Newer versions of the awscli handle V2 regional STS endpoints transparently. My guess is that the terraform provider used by pulumi is using an older version of the AWS SDK that doesn’t handle the selection of regional endpoints transparently, as the newer versions of botocore do. But this is just guess 🤷♂️
Hi @rdanno, thanks for the issue. I’ve raised this with the team to appropriately prioritize. You may be correct that it relates to #2188, so I’ll make sure to discuss that too. Thanks!
Update: So we have found that enabling the same region on the source account (the account with the user which is assuming the role) fixes this issue in aws-cli and pulumi and presumably everything else.
We noticed that assumerole calls are made to both accounts to the regional sts endpoint. It makes sense that if one of these is unavailable the process would fail. Contrary to what the documentation reads… we are following up with AWS for clarification.
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html The documentation there seems to indicate the user in account A does not need the regional endpoint enabled in their account.
That’s a great find, @rdanno! Thanks for updating this issue.