aws-sdk-js-v3: Clients read config from file resulting in EMFILE (too many open files) errors

Describe the bug

This applies to every auto-generated client, but I will use client-sqs as an example.

This is related to: #2271, #2027, #2993


Note that in the above snippet, loadNodeConfig is called multiple times to get the defaults for the client configuration:

https://github.com/aws/aws-sdk-js-v3/blob/bda3b40dd773511b8d5c84a07e9e158a70073523/clients/client-sqs/src/runtimeConfig.ts#L31-L53

Which is the exported fn loadConfig from @aws-sdk/node-config-provider package:

https://github.com/aws/aws-sdk-js-v3/blob/73923416ef2beaefbe7feda0473bc39c734be4b8/packages/node-config-provider/src/configLoader.ts#L27-L37

Which then uses fromSharedConfigFiles to load config values from the disk.

This is a very undesirable “feature”, especially in the serverless environments.

Because under heavy load this results in the following errors:

    error: {
      "type": "NodeError",
      "message": "A system error occurred: uv_os_homedir returned EMFILE (too many open files)",
      "stack":
          SystemError [ERR_SYSTEM_ERROR]: A system error occurred: uv_os_homedir returned EMFILE (too many open files)
              at Object.getHomeDir (/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/index.js:82:17)
              at Object.loadSharedConfigFiles (/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/index.js:11:89)
              at null.<anonymous> (/node_modules/@aws-sdk/node-config-provider/dist-cjs/fromSharedConfigFiles.js:9:53)
              at null.<anonymous> (/node_modules/@aws-sdk/property-provider/dist-cjs/chain.js:11:28)
              at runMicrotasks (<anonymous>)
              at processTicksAndRejections (internal/process/task_queues.js:95:5)
              at null.coalesceProvider (/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:13:24)
              at Object.isConstant (/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:24:28)
              at Object.getEndpointFromRegion (/node_modules/@aws-sdk/config-resolver/dist-cjs/endpointsConfig/utils/getEndpointFromRegion.js:12:34)
              at null.buildHttpRpcRequest (/node_modules/@aws-sdk/client-sqs/dist-cjs/protocols/Aws_query.js:2540:68)
      "code": "ERR_SYSTEM_ERROR",
      "info": {
        "errno": -24,
        "code": "EMFILE",
        "message": "too many open files",
        "syscall": "uv_os_homedir"
      },
      "errno": -24,
      "syscall": "uv_os_homedir"
    }

Yes the call is memoized, but if your Lambda is getting executed heavily, and installation happens within the handler, then this call happens multiple times.

Your environment

SDK version number

@aws-sdk/client-sqs@3.40.0

Is the issue in the browser/Node.js/ReactNative?

Node.js

Details of the browser/Node.js/ReactNative version

node -v v14.17.5

Steps to reproduce

import { SQS } from '@aws-sdk/client-sqs'

const sqs = new SQS({})

Observed behavior

    error: {
      "type": "NodeError",
      "message": "A system error occurred: uv_os_homedir returned EMFILE (too many open files)",
      "stack":
          SystemError [ERR_SYSTEM_ERROR]: A system error occurred: uv_os_homedir returned EMFILE (too many open files)
              at Object.getHomeDir (/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/index.js:82:17)
              at Object.loadSharedConfigFiles (/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/index.js:11:89)
              at null.<anonymous> (/node_modules/@aws-sdk/node-config-provider/dist-cjs/fromSharedConfigFiles.js:9:53)
              at null.<anonymous> (/node_modules/@aws-sdk/property-provider/dist-cjs/chain.js:11:28)
              at runMicrotasks (<anonymous>)
              at processTicksAndRejections (internal/process/task_queues.js:95:5)
              at null.coalesceProvider (/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:13:24)
              at Object.isConstant (/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:24:28)
              at Object.getEndpointFromRegion (/node_modules/@aws-sdk/config-resolver/dist-cjs/endpointsConfig/utils/getEndpointFromRegion.js:12:34)
              at null.buildHttpRpcRequest (/node_modules/@aws-sdk/client-sqs/dist-cjs/protocols/Aws_query.js:2540:68)
      "code": "ERR_SYSTEM_ERROR",
      "info": {
        "errno": -24,
        "code": "EMFILE",
        "message": "too many open files",
        "syscall": "uv_os_homedir"
      },
      "errno": -24,
      "syscall": "uv_os_homedir"
    }

Expected behavior

  1. Do not error out
  2. Do not read config from disk by default, or allow overriding this behaviour.

Screenshots

N/A

Additional context

N/A

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 48 (14 by maintainers)

Most upvoted comments

If anyone stumbles on this, the only way I found to shut the file reading, is to do the following hack.

import * as sharedIniFileLoader from '@aws-sdk/shared-ini-file-loader'

Object.assign(sharedIniFileLoader, {
  loadSharedConfigFiles: async (): Promise<sharedIniFileLoader.SharedConfigFiles> => ({
    configFile: {},
    credentialsFile: {},
  }),
})

fyi: found an interesting package: https://github.com/samswen/lambda-emfiles

made some tests:

Some insights (same workload for all tests)

1) v3.46.0 without any workarounds:

Details

image image

=> some leaks (around 230 emfiles)

2) v3.49.0 without any workarounds:

Details

image image

=> some leaks and more emfiles (up to more than 600 emfiles)

3) v3.49.0 with loadSharedConfigFiles workaround:

Details

image image

=> much less leaks, probably also less emfiles (up to more than 400 emfiles)

4) v3.46.0 with loadSharedConfigFiles workaround:

Details

image image

=> much less leaks, much less emfiles (up to 180 emfiles)

So seems “4) v3.46.0 with loadSharedConfigFiles workaround” is the best!

AWS SDK for JavaScript team discussed this issue in our scrum today. We’re evaluating adding some kind of lock/mutex to fromSharedConfigFiles function, and I’ll provide an update here.

Without this workaround suggested by @moltar it seems v3.49.0 is even getting worse, and opening more file descriptors: probably because of this https://github.com/aws/aws-sdk-js-v3/pull/3192 ? //cc @AllanZhengYP @trivikr

I feel this is part of the same issue, no?

The client should not check the disk for credentials if there are env variables set, which is always the case in the Lambda env.

I can see it still trying to load ~/.aws/config with env vars present though, which might be valid, in a weird way. E.g. if someone puts the config there during Lambda execution, before SDK instantiation, for whatever reason.

I think one approach would be to continue make use the provider pattern, as it is already applied. Just make sure the SDK clients respect any explicit providers given. E.g. could be ConfigProvider and CredentialsProvider, when supplying both of those with instances of (imaginary) ConfigFromMemoryProvider and CredentialsFromEnvironmentProvider then it’d avoid making any disk IO and use the provider only.

If you think this is a viable approach, I will open a separate issue.

Hi folks, I’ve posted WIP PR to remove concurrent/duplicate calls in slurpFile at ~~https://github.com/aws/aws-sdk-js-v3/pull/3281~~ https://github.com/aws/aws-sdk-js-v3/pull/3282

Do post your comments if you take a look. cc folks who had commented on this issue: @adrai @ffxsam @moltar @petermorlion

Pardon any confusion on my part, but it seems like the file descriptor issue is happening for every single API call, not just upon instantiation. I’m only instantiating S3Client once, but then I call CopyObject 2000 times, and I get the EMFILE error. Does this match other people’s experiences?

That’s why I created #3279 😉

The leaking file descriptors of the readFile calls is just one part… the file descriptors of the http requests are the other part of the issue, which could be mitigated with a lower socketTimeout value.

@adrai Can you create a new bug report? It would be easy for tracking fixes.

here we go 😉 https://github.com/aws/aws-sdk-js-v3/issues/3279

An additional curiosity:

Defining a custom requestHandler, with a very low socketTimeout reduces drastically the emfiles count:

  requestHandler: new NodeHttpHandler({
      socketTimeout: 10000 // <- this decreases the emfiles count, the Node.js default is 120000
  })

Great find! Any idea what the default value is?

Edit: I see the commend: 120000.

@AllanZhengYP @ajredniwja Could you please check into this when you get a chance? This is a serious issue that impacted our production system recently.

The aws-sdk should detect if running in lambda and omit the loadSharedConfigFiles call completely, or at least provide a “supported” way to disable the loadSharedConfigFiles call…