aws-sdk-js: Intermittent "EC2 Metadata roleName request returned error" (EINVAL) on ECS Fargate
- I’ve gone through Developer Guide and API reference
- I’ve checked AWS Forums and StackOverflow for answers
- I’ve searched for previous similar issues and didn’t find any solution
Describe the bug I am running a node 12.16 app on ECS Fargate. It’s performing operations on files in S3 - streaming from a source bucket and uploading to a destination bucket. About 5 hours ago I started to see the following error when uploading to the destination bucket:
"originalError": {
"message": "Could not load credentials from any providers",
"errno": "EINVAL",
"code": "CredentialsError",
"syscall": "connect",
"address": "169.254.169.254",
"port": 80,
"time": "2020-05-28T14:32:43.621Z",
"originalError": {
"message": "EC2 Metadata roleName request returned error",
"errno": "EINVAL",
"code": "EINVAL",
"syscall": "connect",
"address": "169.254.169.254",
"port": 80,
"time": "2020-05-28T14:32:43.620Z",
"originalError": {
"errno": "EINVAL",
"code": "EINVAL",
"syscall": "connect",
"address": "169.254.169.254",
"port": 80,
"message": "connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0)"
}
}
}
It happened for several minutes and then stopped. Then happened again for a couple minutes about an hour ago and stopped. So it’s intermittent. This seems very similar to what was reported in https://github.com/aws/aws-sdk-js/issues/2534#issuecomment-465308420 and asked on the forum here, but has received no answer. I’m using a task role that has PUT
permissions on the destination bucket. As I said, this is intermittent so when it’s not happening, everything is working as it should. For some reason, it seems that there is an issue pulling credentials from the metadata service.
I’m going to update the SDK to the latest to see if that resolves it but I didn’t see anything in the changelog that would indicate it would. Any guidance would be greatly appreciated. Thanks!
Is the issue in the browser/Node.js? Node.js
If on Node.js, are you running this on AWS Lambda? No
SDK version number v2.647.0
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 15
- Comments: 36 (4 by maintainers)
Same issue here. NodeJS running on fargate. SDK version 2.745.0
Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1
I had the same problem. It cost me quite some head ache because I had this running in AWS Fargate and debugging is not that easy there.
The error means the Javascript SDK can not find the AWS credentials. If nothing is configured the SDK tries to load the credentials from different places. Here you can see in what order the SDK tries to load the credentials: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/setting-credentials-node.html
My error was quite embarrassing, I just had a typo in my environment variables. My variable was
AWS_ACCESSS_KEY_ID
instead ofAWS_ACCESS_KEY_ID
. (Quite hard to see the difference, right?)So probably double check the names of your environment variables (or config files)
I keep getting this issue even though my configuration is correct.
Still having this issue with 2.876.0. is there a way to install aws-sdk v3 via npm?
UPDATE: I fixed it with setting
task_role_arn
inaws_ecs_task_definition
I use environment variables to pass in the AWS keys and following the naming convention from their docs solved the problem for me. SDK will automatically detect and load the environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
Reference: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/loading-node-credentials-environment.html
Docker Image: node:14.15.4-buster aws-sdk: 2.789.0
FYI, it may not be a reasonable solution for all, but I confirmed that ECS Fargate works just fine using the v3 AWS Node SDK which came out in General Availability on 12/15: https://aws.amazon.com/blogs/developer/modular-aws-sdk-for-javascript-is-now-generally-available/ https://github.com/aws/aws-sdk-js-v3
Same issue here, definitely think it has something to do with ECS Fargate; although, it does work on some of my S3 put object requests. I tried to disable this request w/
AWS_EC2_METADATA_DISABLED
, but the error still happens but now it is:I don’t use
AWS_*
evn vars for credentials, since the ECS Fargate task has access to S3 via my task’s IAM role.Using
AWS_ACCESS_KEY_ID
/AWS_SECRET_ACCESS_KEY
env vars to use an IAM User works; but, I should be able to rely on the IAM role built into the ECS Task.@ruchisharma189 In my experience/case, the intermittent issue was caused by a very high throughput on the metadata API. Metadata API is used by the SDK to retrieve the execution role at every service initialization. Few hundreds of
v2
SDK services initialization. e.g.:new AWS.S3({...})
can cause this, mainly on Fargate, but it would happen less frequently on EC2 backed ECS.By optimizing the SDK services initialization (caching) and later on, migrating to
v3
(where credentials are loaded once and then consumed by the services) made the problem disappear from our systems entirely.Hope it helps 😃
We figured it’s some kind of timeout between instance creation and usage of SQS. As a workaround we use
getQueueAttributes()
after creating the SQS instance and that seems to fix the problem in our case.Hey, just wanted to say that your AWS creds are visible in your image. I recommend revoking them 😃
Also, I’m having the same issue as you. Only I’m running an EKS cluster on Fargate and am getting this issue with my pods. I don’t run into this issue on an EC2 Node Group though.
*** Update
In my case, we were using Terraform to provision everything in AWS. We use Fargate and IRSA to give our containers permission. What ended up being the issue was that when you create an EKS cluster and an Identity Provider, Terraform will not populate the thumbprint list for the identity provider. We ended up having to populate it ourselves with a TLS certificate.
If you create everything through the AWS management console the thumbprint list is populated automatically for you.
So basically if you have the same error as me, check the thumbprint list of the identity provider.
Hope this helps.
We upgraded from NodeJS 12 to 14 and had a successful run after that. We cannot say whether this is just coincidental or whether it is due to the new NodeJS version.
UPDATE: The problem appeared again, so NodeJS 14 is not the solution. 😞
I’ll follow up with the fix for me;
I needed to explicitly set up
aws configure
in the docker image. Even though my container had all the /.aws/ contents copied over, it wasn’t enough for the AWS-SDK to pick it up ‘magically’. I suggest ensuring your environment has a profile configured explicitly in the place where you’re running your function through the AWS-CLI. This solution resolved the issue for both HTTP and SNS/SQS.Hey everyone, if there is a reproducible case can you please share it, the internal ticket was opened for the same but there was no reproducible case provided. Seems like to happen under high memory/cpu usage, retrying the request should be considerable.
Seeing this exact issue as well. IAM role needs to be fixed
Getting same issue @ajredniwja
Hi, I am following this doc(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html ) to select region dynamically in aws. And I tried to test the code in aws ecs fargate it gives me below error
However, it runs perfectly on ecs ec2 task. I use “aws-sdk”: “^2.701.0”. It’s js code in a docker container. Any solution appretiated.
Hi everyone, having exactly the same issue @summera reported with almost the same setup. Very intermittent, have 10-15 clusters, receiving a few thousand requests, and the issue seems to raise once every week, so very rare! Had to set cloudwatch alarms with log filter to get those. So, monitoring very closely.
ECS Task, fargate managed, nodejs 13 image built from
node:13.10-alpine
, task deployed through CF, have a few ENVs set (nothing new). At code level, using aws-sdk2.701.0
and usually as my first access is on DynamoDB, the issue arises when querying dynamo.The weirdest thing is that the issue raises into a task that is running for quite a long time, and in the middle of a bunch of successful requests. That said, I would eliminate any configuration issue, but not SDK; however, the clues (for me) point to ECS metadata service being unavailable for some reason.
One detail is that we use New Relic on some apps, so the trace is faulted for debugging purposes.
Any thoughts?
Makes sense, though I was only asking about plausibility. If any of those are not plausible, it makes it easier to focus efforts.
Which two cases are you referring to exactly?
Hi @ajredniwja. Thank you for the response. After the issue occurred, I updated the SDK to
2.685.0
. I also realized that the issue happened during a spike in requests so I scaled up the minimum tasks by one. Since then, I haven’t seen the issue occur again. The JSON I included in my first comment (https://github.com/aws/aws-sdk-js/issues/3284#issue-626634869) is coming straight from my logs. Is there something else you were looking to see from the logs?As for reproducing, I haven’t seen this happen since upgrading and scaling up our minimum tasks. However, since this happened during high load when a lot of requests came in and therefore many parallel uploads to S3, I’m wondering if one or more of the following may be possibilities?
Do any of the above sound plausible?