aws-sdk-js: `Missing credentials in config` happening intermittently

We’ve been having some difficulties in working that the SDK is intermittently unable to fetch credentials and this renders our application unauthorised. The ec2 where this is occurring has a particular IAM role and the SDK is therefore reaching out to the metadata endpoint (169.254…) to fetch it’s keys. However, when it does so it occasionally appears to throw this type of error:

So, for example this dynamoDB was logged by our application with an SDK error:

{
    "error": {
        "message": "Missing credentials in config",
        "code": "CredentialsError",
        "errno": "ECONNREFUSED",
        "syscall": "connect",
        "time": "2015-07-15T21:55:06.083Z",
        "originalError": {
            "message": "Could not load credentials from EC2MetadataCredentials",
            "code": "CredentialsError",
            "errno": "ECONNREFUSED",
            "syscall": "connect",
            "time": "2015-07-15T21:55:06.083Z",
            "originalError": {
                "code": "ECONNREFUSED",
                "errno": "ECONNREFUSED",
                "syscall": "connect",
                "message": "connect ECONNREFUSED"
            }
        }
    },
    "level": "error",
    "message": "DynamoDB Query failed",
    "timestamp": "2015-07-15T21:55:06.087Z"
}

More recently, this S3 call had this similar error:

...
    "originalError": {
      "message": "Could not load credentials from any providers",
      "code": "CredentialsError",
      "errno": "ECONNREFUSED",
      "syscall": "connect",
      "address": "169.254.169.254",
      "port": 80,
      "time": "2015-08-26T06:08:18.008Z",
      "originalError": {
        "code": "ECONNREFUSED",
        "errno": "ECONNREFUSED",
        "syscall": "connect",
        "address": "169.254.169.254",
        "port": 80,
        "message": "connect ECONNREFUSED 169.254.169.254:80"
      }
    }
...

We’ve experienced the problem with multiple applications intermittently, but as frequently as half a dozen times per day on a single ec2. We’re using NodeJS aws-sdk version 2.1.46 in the example above and iojs 2.3.1 here, nodeJS 0.12.x elsewhere. We’re in the ap-southeast-2 region.

While it would appear that the connection’s being refused, I’d be surprised to see this endpoint actually go down. Is it possible we’re doing something stupid with node to create this, or else possibly there be a genuine issue?

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 6
  • Comments: 47 (6 by maintainers)

Most upvoted comments

So this is still happening in v2.6.9 on an EC2 instance (utilizing elastic beanstalk).

{“message”:“Missing credentials in config”,“name”:“CredentialsError”,“stack”:“Error: connect ECONNREFUSED 169.254.169.254:80\n at Object.exports._errnoException (util.js:874:11)\n at exports._exceptionWithHostPort (util.js:897:20)\n at TCPConnectWrap.afterConnect as oncomplete”,“code”:“CredentialsError”}

Hi,

The PR for retrying EC2MetadataCredentials and ECSCredentials has been merged to master, so you can try it out now by cloning the repo, or you can wait for the next release of the SDK in NPM. By default it times out after 1000ms and retries up to 3 times with a base delay of 100ms. If you still get intermittent timeout errors even with this default retry behavior, you can try increasing the timeout, the max retries, and the retry delay:

AWS.config.credentials = new AWS.EC2MetadataCredentials({
    httpOptions: { timeout: 5000 },
    maxRetries: 10,
    retryDelayOptions: { base: 200 }
});

If that still doesn’t work, please let me know!

I tracked down a detailed error message for my case:

{
  "message": "Missing credentials in config",
  "code": "CredentialsError",
  "time": "Thu Sep 03 2015 17:17:33 GMT+0000 (UTC)",
  "originalError": {
    "message": "Could not load credentials from any providers",
    "code": "CredentialsError",
    "time": "Thu Sep 03 2015 17:17:33 GMT+0000 (UTC)",
    "originalError": {
      "message": "Connection timed out after 1000ms",
      "code": "TimeoutError",
      "time": "Thu Sep 03 2015 17:17:33 GMT+0000 (UTC)"
    }
  }
}

I see we’re getting a connection timeout when trying to load credentials instead of a connection refused. That might be a different issue, even though the top level error is the same.

Same here, single instance of SDK, still problems.

We are seeing this too. Can’t be a throttling issue, it is on a staging instance that is only hit a few times per hour. Additionally, it is happening at application startup, so the server never starts.

Yep, I understand why I see the issue, I built the scenario explicitly to expose it!

The simple facts: it is possible, in fact innevitable, using only AWS products (EC2 & the SDK), to bring an EC2 instance to its knees. Above are outlined the exact steps to reproduce the situation. What’s frustrating to me, as a customer, is the difficulty I’m having raising this as a bug report. I guess I assumed that there would be internal process to route it to the appropriate place, but instead I keep getting redirected myself.

Ran into this issue locally - was due to some shenanigans with process.env.

Fix was to manually pass in accessKey and secretAccessKey to aws.config.update(...).

@LiuJoyceC Thanks for the feedback. I was able to reliably reproduce the issue with the code I provided, but I admit, I haven’t looked into it since, so it’s possible that things have changed since then. I notice though that you don’t mention running multiple processes however, so perhaps that indicates why you couldn’t reproduce? To reproduce the issue I needed to run the provided script up to ten times concurrently.

Thanks for looking into the issue. Hopefully you’ll have some success with backed off retries, and hopefully the suggestions above might help you test a fix if you can reproduce the problem.

Cheers!

Oddly enough, this is still happening to me even when specifying the credentials in environment variables.

Indeed, that’s essentially what I did. I have updated my comment above with the solution.

You could change this code to start your loop only after AWS.config.getCredentials(cb) finishes. Otherwise you fire async s3 operations at the same time and they all think (and they are right) that they need to fetch credentials.

@davidporter-id-au It looks like the EC2 metadata service is throttling requests from your code. The SDK itself does cache credentials fetched from the metadata service, so multiple simultaneous requests don’t bombard the metadata service. See https://github.com/aws/aws-sdk-js/pull/448

Is your code part of a shell script that is invoked in a loop of some sort? Hitting the metadata service multiple times in succession can cause the requests to be throttled.