aws-sdk-js: Intermittent "EC2 Metadata roleName request returned error" (EINVAL) on ECS Fargate

Describe the bug I am running a node 12.16 app on ECS Fargate. It’s performing operations on files in S3 - streaming from a source bucket and uploading to a destination bucket. About 5 hours ago I started to see the following error when uploading to the destination bucket:

    "originalError": {
        "message": "Could not load credentials from any providers",
        "errno": "EINVAL",
        "code": "CredentialsError",
        "syscall": "connect",
        "address": "169.254.169.254",
        "port": 80,
        "time": "2020-05-28T14:32:43.621Z",
        "originalError": {
            "message": "EC2 Metadata roleName request returned error",
            "errno": "EINVAL",
            "code": "EINVAL",
            "syscall": "connect",
            "address": "169.254.169.254",
            "port": 80,
            "time": "2020-05-28T14:32:43.620Z",
            "originalError": {
                "errno": "EINVAL",
                "code": "EINVAL",
                "syscall": "connect",
                "address": "169.254.169.254",
                "port": 80,
                "message": "connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0)"
            }
        }
    }

It happened for several minutes and then stopped. Then happened again for a couple minutes about an hour ago and stopped. So it’s intermittent. This seems very similar to what was reported in https://github.com/aws/aws-sdk-js/issues/2534#issuecomment-465308420 and asked on the forum here, but has received no answer. I’m using a task role that has PUT permissions on the destination bucket. As I said, this is intermittent so when it’s not happening, everything is working as it should. For some reason, it seems that there is an issue pulling credentials from the metadata service.

I’m going to update the SDK to the latest to see if that resolves it but I didn’t see anything in the changelog that would indicate it would. Any guidance would be greatly appreciated. Thanks!

Is the issue in the browser/Node.js? Node.js

If on Node.js, are you running this on AWS Lambda? No

SDK version number v2.647.0

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 15
  • Comments: 36 (4 by maintainers)

Most upvoted comments

Same issue here. NodeJS running on fargate. SDK version 2.745.0

Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1

I had the same problem. It cost me quite some head ache because I had this running in AWS Fargate and debugging is not that easy there.

The error means the Javascript SDK can not find the AWS credentials. If nothing is configured the SDK tries to load the credentials from different places. Here you can see in what order the SDK tries to load the credentials: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/setting-credentials-node.html

My error was quite embarrassing, I just had a typo in my environment variables. My variable was AWS_ACCESSS_KEY_ID instead of AWS_ACCESS_KEY_ID. (Quite hard to see the difference, right?)

So probably double check the names of your environment variables (or config files)

I keep getting this issue even though my configuration is correct.

image

Still having this issue with 2.876.0. is there a way to install aws-sdk v3 via npm?

UPDATE: I fixed it with setting task_role_arn in aws_ecs_task_definition

I use environment variables to pass in the AWS keys and following the naming convention from their docs solved the problem for me. SDK will automatically detect and load the environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN

Reference: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/loading-node-credentials-environment.html

Docker Image: node:14.15.4-buster aws-sdk: 2.789.0

FYI, it may not be a reasonable solution for all, but I confirmed that ECS Fargate works just fine using the v3 AWS Node SDK which came out in General Availability on 12/15: https://aws.amazon.com/blogs/developer/modular-aws-sdk-for-javascript-is-now-generally-available/ https://github.com/aws/aws-sdk-js-v3

Same issue here, definitely think it has something to do with ECS Fargate; although, it does work on some of my S3 put object requests. I tried to disable this request w/AWS_EC2_METADATA_DISABLED, but the error still happens but now it is:

CredentialsError: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1

I don’t use AWS_* evn vars for credentials, since the ECS Fargate task has access to S3 via my task’s IAM role.


Using AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars to use an IAM User works; but, I should be able to rely on the IAM role built into the ECS Task.

@ruchisharma189 In my experience/case, the intermittent issue was caused by a very high throughput on the metadata API. Metadata API is used by the SDK to retrieve the execution role at every service initialization. Few hundreds of v2 SDK services initialization. e.g.: new AWS.S3({...}) can cause this, mainly on Fargate, but it would happen less frequently on EC2 backed ECS.

By optimizing the SDK services initialization (caching) and later on, migrating to v3 (where credentials are loaded once and then consumed by the services) made the problem disappear from our systems entirely.

Hope it helps 😃

We figured it’s some kind of timeout between instance creation and usage of SQS. As a workaround we use getQueueAttributes() after creating the SQS instance and that seems to fix the problem in our case.

螢幕快照 2021-07-09 上午8 28 37

i run my code fine on my computer, but get this error when i’m using EC2 docker node version: node:14.15.4-buster aws-sdk: 2.940.0 still not work for me…

UPDATE: I work it on!! I’m using docker-compose, so I try setting volumes in my docker-compose.yml file, and it works.

        volumes:
            - /home/ubuntu/.aws:/root/.aws

-> outside the container : container itself. so inside the container will lead to ~./aws/credentials hope it also works for you.

Hey, just wanted to say that your AWS creds are visible in your image. I recommend revoking them 😃

Also, I’m having the same issue as you. Only I’m running an EKS cluster on Fargate and am getting this issue with my pods. I don’t run into this issue on an EC2 Node Group though.

*** Update

In my case, we were using Terraform to provision everything in AWS. We use Fargate and IRSA to give our containers permission. What ended up being the issue was that when you create an EKS cluster and an Identity Provider, Terraform will not populate the thumbprint list for the identity provider. We ended up having to populate it ourselves with a TLS certificate.

If you create everything through the AWS management console the thumbprint list is populated automatically for you.

So basically if you have the same error as me, check the thumbprint list of the identity provider.

Hope this helps.

We upgraded from NodeJS 12 to 14 and had a successful run after that. We cannot say whether this is just coincidental or whether it is due to the new NodeJS version.

UPDATE: The problem appeared again, so NodeJS 14 is not the solution. 😞

I’ll follow up with the fix for me;

I needed to explicitly set up aws configure in the docker image. Even though my container had all the /.aws/ contents copied over, it wasn’t enough for the AWS-SDK to pick it up ‘magically’. I suggest ensuring your environment has a profile configured explicitly in the place where you’re running your function through the AWS-CLI. This solution resolved the issue for both HTTP and SNS/SQS.

Hey everyone, if there is a reproducible case can you please share it, the internal ticket was opened for the same but there was no reproducible case provided. Seems like to happen under high memory/cpu usage, retrying the request should be considerable.

Seeing this exact issue as well. IAM role needs to be fixed

Getting same issue @ajredniwja

Hi, I am following this doc(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html ) to select region dynamically in aws. And I tried to test the code in aws ecs fargate it gives me below error

 { Error: connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0)	
    at internalConnect (net.js:882:16)        
    at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)	       
    at defaultTriggerAsyncIdScope (net.js:972:9)            
    at process._tickCallback (internal/process/next_tick.js:61:11)        
    errno: 'EINVAL',            
    code: 'EINVAL',            
    syscall: 'connect',            
    address: '169.254.169.254',            
    port: 80 
    }

However, it runs perfectly on ecs ec2 task. I use “aws-sdk”: “^2.701.0”. It’s js code in a docker container. Any solution appretiated.

Hi everyone, having exactly the same issue @summera reported with almost the same setup. Very intermittent, have 10-15 clusters, receiving a few thousand requests, and the issue seems to raise once every week, so very rare! Had to set cloudwatch alarms with log filter to get those. So, monitoring very closely.

ECS Task, fargate managed, nodejs 13 image built from node:13.10-alpine, task deployed through CF, have a few ENVs set (nothing new). At code level, using aws-sdk 2.701.0 and usually as my first access is on DynamoDB, the issue arises when querying dynamo.

The weirdest thing is that the issue raises into a task that is running for quite a long time, and in the middle of a bunch of successful requests. That said, I would eliminate any configuration issue, but not SDK; however, the clues (for me) point to ECS metadata service being unavailable for some reason.

One detail is that we use New Relic on some apps, so the trace is faulted for debugging purposes.

Any thoughts?

 ckcpb72fh02l401x5470g8ctn-ckcpb72fh02l501x53ikgcj1p [ERROR] [] CredentialsError: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1 - Error: connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0)
    at internalConnect (net.js:921:16)
    at defaultTriggerAsyncIdScope (internal/async_hooks.js:313:12)
    at net.js:1011:9
    at Shim.applySegment (/usr/src/httpd/node_modules/newrelic/lib/shim/shim.js:1430:20)
    at wrapper (/usr/src/httpd/node_modules/newrelic/lib/shim/shim.js:2092:17)
    at processTicksAndRejections (internal/process/task_queues.js:79:11)

I cannot point you towards any of those with complete certainty because we dont have any concrete evidence.

Makes sense, though I was only asking about plausibility. If any of those are not plausible, it makes it easier to focus efforts.

Can you use the following and collect logs for both the cases, in that way we can compare and come to some conclusion

Which two cases are you referring to exactly?

Hi @ajredniwja. Thank you for the response. After the issue occurred, I updated the SDK to 2.685.0. I also realized that the issue happened during a spike in requests so I scaled up the minimum tasks by one. Since then, I haven’t seen the issue occur again. The JSON I included in my first comment (https://github.com/aws/aws-sdk-js/issues/3284#issue-626634869) is coming straight from my logs. Is there something else you were looking to see from the logs?

As for reproducing, I haven’t seen this happen since upgrading and scaling up our minimum tasks. However, since this happened during high load when a lot of requests came in and therefore many parallel uploads to S3, I’m wondering if one or more of the following may be possibilities?

  • Metadata service in Fargate failed to respond under high load for one reason or another.
  • The SDK is or was not caching credentials retrieved from the metadata service and was therefore hitting the metadata service more than necessary and bombarding it with requests.
  • Some transient issue happened with the Fargate service and has been resolved.

Do any of the above sound plausible?