aws-sdk-js-v3: EMFIL (too many open files) still exists when making mass S3 calls in JS V3 @aws-sdk/client-s3

Checkboxes for prior research

Describe the bug

Running a lambda script that processes a large amount of artifacts from a CodePipeline into S3. The copying process is asynchronous to improve speed and I am hitting descriptor limits within the lambda.

The lambda has hard limit of 1024 descriptors - https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html

This issue previously was brought up in version 3.40.0 and up to 3.52.0 and was previously described in this issue log/forum : https://github.com/aws/aws-sdk-js-v3/issues/3019

I have tried implementing the solutions suggested in the issue log, but none have worked

Stack Trace:

2023-01-19T00:18:48.903Z	f9f83cda-da62-41dc-8e2a-240a3bc67b12	ERROR	Invoke Error 	{
    "errorType": "SystemError",
    "errorMessage": "A system error occurred: uv_os_homedir returned EMFILE (too many open files)",
    "code": "ERR_SYSTEM_ERROR",
    "info": {
        "errno": -24,
        "code": "EMFILE",
        "message": "too many open files",
        "syscall": "uv_os_homedir"
    },
    "errno": -24,
    "syscall": "uv_os_homedir",
    "stack": [
        "SystemError [ERR_SYSTEM_ERROR]: A system error occurred: uv_os_homedir returned EMFILE (too many open files)",
        "    at getHomeDir (/var/task/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/getHomeDir.js:14:29)",
        "    at getCredentialsFilepath (/var/task/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/getCredentialsFilepath.js:7:128)",
        "    at loadSharedConfigFiles (/var/task/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/loadSharedConfigFiles.js:11:76)",
        "    at /var/task/node_modules/@aws-sdk/node-config-provider/dist-cjs/fromSharedConfigFiles.js:8:102",
        "    at /var/task/node_modules/@aws-sdk/property-provider/dist-cjs/chain.js:11:28",
        "    at runMicrotasks (<anonymous>)",
        "    at processTicksAndRejections (internal/process/task_queues.js:95:5)",
        "    at async coalesceProvider (/var/task/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:14:24)",
        "    at async /var/task/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:26:28",
        "    at async resolveParams (/var/task/node_modules/@aws-sdk/middleware-endpoint/dist-cjs/adaptors/getEndpointFromInstructions.js:29:40)"
    ]
}

SDK version number

@aws-sdk/client-s3@3.252.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

Node v14.x - Lambda provided version

Reproduction Steps

I can’t share my works code unfortunatly, but snippets from https://github.com/aws/aws-sdk-js-v3/issues/3019 will reproduce the error. I have essentially a lambda that is downloading 5 or 6 artifacts from s3, unzipping them in memory and then uploading the results to s3 buckets located in every available AWS region. This uploads every file recursively within the zip which in a couple projects includes node_modules folders. This is heavily asynchronous to optimise performance

Observed Behavior

Part way through processing the uploading fails and dumps the following stack trace:

2023-01-19T00:18:48.903Z	f9f83cda-da62-41dc-8e2a-240a3bc67b12	ERROR	Invoke Error 	{
    "errorType": "SystemError",
    "errorMessage": "A system error occurred: uv_os_homedir returned EMFILE (too many open files)",
    "code": "ERR_SYSTEM_ERROR",
    "info": {
        "errno": -24,
        "code": "EMFILE",
        "message": "too many open files",
        "syscall": "uv_os_homedir"
    },
    "errno": -24,
    "syscall": "uv_os_homedir",
    "stack": [
        "SystemError [ERR_SYSTEM_ERROR]: A system error occurred: uv_os_homedir returned EMFILE (too many open files)",
        "    at getHomeDir (/var/task/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/getHomeDir.js:14:29)",
        "    at getCredentialsFilepath (/var/task/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/getCredentialsFilepath.js:7:128)",
        "    at loadSharedConfigFiles (/var/task/node_modules/@aws-sdk/shared-ini-file-loader/dist-cjs/loadSharedConfigFiles.js:11:76)",
        "    at /var/task/node_modules/@aws-sdk/node-config-provider/dist-cjs/fromSharedConfigFiles.js:8:102",
        "    at /var/task/node_modules/@aws-sdk/property-provider/dist-cjs/chain.js:11:28",
        "    at runMicrotasks (<anonymous>)",
        "    at processTicksAndRejections (internal/process/task_queues.js:95:5)",
        "    at async coalesceProvider (/var/task/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:14:24)",
        "    at async /var/task/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:26:28",
        "    at async resolveParams (/var/task/node_modules/@aws-sdk/middleware-endpoint/dist-cjs/adaptors/getEndpointFromInstructions.js:29:40)"
    ]
}

I tried implementing a number of solutions offered within the issue log previously mentioned. But nothing has changed hitting this limit.

I have tried reducing my client socket timeouts, modified the config file reader to not read. But I am still hitting this error

Expected Behavior

Ultimatly, to not have this error as I believe it is the SDK client leaking descriptors. But some way to control or modify how the SDK handles its descriptors would be good

Possible Solution

No response

Additional Information/Context

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 5
  • Comments: 36 (14 by maintainers)

Most upvoted comments

any release plan mate?

We’ll aim to publish the fix with v3.405.0 on Friday September 1st.

But it can get delayed to v3.406.0 on Tuesday September 5th as:

  • The fix is provided without repro specific to homedir. It’s attempted just by going through stack trace.
  • It’s little tight to get fix merged+published in @smithy repo, and update @aws-sdk clients.

It’s also long weekend in United States.

It works great mate, many thanks to your quick action!

Closing, as the specific issue which throws EMFILE errors on uv_os_homedir mentioned in the main issue description has been fixed. You need to have SDK >=3.378.0 with lockfile updated or >=3.405.0 (releasing on Fri, Sep 1).

If you need a workaround for other EMFILE issues, we’ve provided three recommendations in https://github.com/aws/aws-sdk-js-v3/issues/4345#issuecomment-1699733113

Please create a new issue if you’re still getting EMFILE errors with the provided fix.

@cjnoname Some good news for you!

The fix was published in @smithy/shared-ini-file-loader@2.0.6, and since we use ^2.0.0 you can test it right away!

Can you update your lockfile to include @smithy/shared-ini-file-loader@2.0.6 and check if the issue if fixed? If you don’t use lockfile, the fix will be available to you with no updates needed.

From your comment in #4345 (comment), it looks like you may be using Lambda Provided SDK (v3.188.0) or customer deployed SDK version prior to migration to @smithy scope i.e. <=3.362.0

For verification, you need to use ~>=3.363.0~ >=3.378.0 (depends on ^2.0.0) for testing, and deploy your own SDK to Lambda for testing.

It works great mate, many thanks to your quick action!

@cjnoname Some good news for you!

The fix was published in @smithy/shared-ini-file-loader@2.0.6, and since we use ^2.0.0 you can test it right away!

Can you update your lockfile to include @smithy/shared-ini-file-loader@2.0.6 and check if the issue if fixed? If you don’t use lockfile, the fix will be available to you with no updates needed.

From your comment in #4345 (comment), it looks like you may be using Lambda Provided SDK (v3.188.0) or customer deployed SDK version prior to migration to @smithy scope i.e. <=3.362.0

For verification, you need to use ~>=3.363.0~ >=3.378.0 (depends on ^2.0.0) for testing, and deploy your own SDK to Lambda for testing.

Thanks for your quick action, I have deployed the new version to the testing environment, and my team will test it in short. Will let you know how it goes 😃 Thanks again!

Maybe we can cache results of os.homedir() with process.geteuid()

We’ve posted a fix at https://github.com/awslabs/smithy-typescript/pull/903

The present working directory is attached to Effective user ID, as per the documentation and source code.

Docs: https://nodejs.org/api/os.html#oshomedir

On POSIX, it uses the $HOME environment variable if defined. Otherwise it uses the effective UID to look up the user’s home directory.

Source code: https://github.com/nodejs/node/blob/a39b8a2a3e8414c8a54650bdfe4a46998282d88a/deps/uv/src/unix/core.c#L1193

Maybe we can cache results of os.homedir() with process.geteuid()

Adding in client.destroy() just before resolving my data seemed to fix the issue for me. Here is an example of the majority of my calls and no longer getting this file error or even rate limit issues that I has seen before. Hope this may help someone else:

const DescribeInstances = (auth, params) => new Promise((resolve) => {
    /**
     * added environment variable
     * AWS_NODEJS_CONNECTION_REUSE_ENABLED=1
     * didn't make difference for file error however did speed up function
     */
    var retry = 1;
    var collection = [];
    /**
     * Moved client outside of pagination loop function
     * Did not make any difference as still get file error
    */
    const client = new EC2Client(auth); 
    const Run = async (auth, params) => {
        const command = new DescribeInstancesCommand(params);
        try {
            const response = await client.send(command);
            if (response && response.Reservations && response.Reservations.length > 0) {
                response.Reservations.forEach(x => {
                    if (r.Instances && r.Instances.length > 0) {
                        r.Instances.forEach(x => {
                            x.Region = auth.region;
                            collection.push(x)
                        });
                    };
                });
            }
            if (response.NextToken) {
                params.NextToken = response.NextToken;
                Run(auth, params);
            } else {
                /**
                 * Added client.destroy() before returning our collection
                 * Fixed issue with the file error
                 */
                client.destroy()
                resolve(collection)
            }
        } catch (error) {
            if (error.message.toLowerCase().includes('rate exceeded') && retry < 25) {
                setTimeout(() => { retry += 1; return Run(auth, params) }, retry * 100);
            } else {
                await GenerateError(client, error, 'DescribeInstances', 'DescribeInstancesCommand', params, auth.region, __filename);
                resolve(collection);
            };
        };
    };
    Run(auth, params);
});