aws-sdk-js-v3: Leaking file descriptors when generating too many http requests

This issue is extracted from this original issue: https://github.com/aws/aws-sdk-js-v3/issues/3019, like requested: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1030267030

#3019 is concentrated to the readFile (leaking file descriptor) “issue” and this issue is concentrated to the network (leaking file descriptor) “issue”


It seems lambda is not “waiting” for the file descriptors to be closed.

This can be observed especially, when having warm lambda executions with a lot of sdk calls, like for DynamoDb or S3, etc… This each http request opens a network socket which results in an open file descriptor. Since by default in Node.js the socket timeout is set to 120000ms (2 minutes) it may be the lambda is already finished, but the sockets are still open. When “restarting” the lambda for the next invocations, those file descriptors may still be open. This leads to this type of EMFILE errors:

Error: connect EMFILE 52.94.5.100:443 - Local (undefined:undefined)
Error: getaddrinfo EMFILE dynamodb.eu-west-1.amazonaws.com
Error: A system error occurred: uv_os_homedir returned EMFILE (too many open files)

These basic tests shows the count (and leaks) of the emfile count:

Tests originally done in this issue here: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1028840006

Details

image image

compared to this tests: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1029287130

Defining a custom requestHandler, with a very low socketTimeout reduces drastically the emfiles count:

  requestHandler: new NodeHttpHandler({
      socketTimeout: 10000 // <- this decreases the emfiles count, the Node.js default is 120000
  })
Details

image image

That’s why I suggest to set a low socket Timeout by default, like proposed here: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1029713958

proposal:

// https://github.com/aws/aws-sdk-js-v3/blob/main/packages/node-http-handler/src/node-http-handler.ts#L62
socketTimeout: socketTimeout || 10000

and probably also here?

// https://github.com/aws/aws-sdk-js-v3/blob/main/packages/node-http-handler/src/node-http2-handler.ts#L44
this.requestTimeout = requestTimeout || 10000;

PS. btw. it seems it got worse (more file descriptors) when updating from v3.46.0 to v3.49.0

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 34 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@trivikr I think I found what is causing all this extra open EMFILEs…

It’s probably exactly what @AllanZhengYP commented here: https://github.com/aws/aws-sdk-js-v3/blame/main/packages/node-http-handler/src/node-http-handler.ts#L75

When doing all these hundreds of concurrent requests, the code is not waiting for the this.config to be ready, and will initialize a loooot of new http(s) clients here: https://github.com/aws/aws-sdk-js-v3/blame/main/packages/node-http-handler/src/node-http-handler.ts#L64

All this was introduced in v3.47.0 with this commit: https://github.com/aws/aws-sdk-js-v3/commit/9152e210c6ec29f34bb070eaf2874039022e6ab7

I tested with this little hack, and it seems to work much better like this: image // cc: @mcollina you may know of a more performant way to sync up these concurrent calls?

btw: to generate some concurrent requests, it is enough to do something like this:

import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3'
const s3Client = new S3Client({ region: 'eu-west-1' })

for (let index = 0; index < 1000; index++) {
  s3Client.send(new GetObjectCommand({
    Bucket: 'some-bucket',
    Key: 'some-key'
  })).then(() => {}).catch(() => {})  
}

tldr; the amount of EMFILE generated by readFile is nothing compared to the amount of EMFILE generated by the http requests… that’s why you will not notice a decrease of 10 EMFILEs when there are other hundreds or thousands of EMFILEs caused by the http requests

@alexforsyth @trivikr any update on this? When will we be able to update from v3.46.0 to a newer fixed version?

@AllanZhengYP will take a look in this issue.

fyi: with 3.51.0 in my setup, I still get EMFILE errors, too many open sockets… because of the socketTimeout… but it seems it is a very little bit better, but still not good

I concur, I tried with version 3.52.0

My load test results still hasn’t improved, didn’t pass even for 1000 parallel dynamodb updates (promise.all), not much improvement [had 10000 updates working fine in 3.33.x, now my chunk size is 800] > @Muthuveerappanv we’ve reduced readFile calls in https://github.com/aws/aws-sdk-js-v3/pull/3285 which was released in v3.51.0.

Can you test with >3.51.0 and share your results?

fyi: the tests I did where without extra readFile calls, so these are all coming from http requests. So the tests run with:

import sharedIniFileLoader from '@aws-sdk/shared-ini-file-loader'
Object.assign(sharedIniFileLoader, { loadSharedConfigFiles: async () => ({ configFile: {}, credentialsFile: {} }) })

That’s why my proposal is to have a very low socketTimeout by default. In my tests setting a low socketTimeout decreased the EMFILE count from up to over 600 to under 100 => 6x.

The fix in https://github.com/aws/aws-sdk-js-v3/pull/3285 is most likely going to fix this issue too, as the number of file reads is reduced from 16 to 2 for client+operation call.

ToDo: Reproduce this issue using repo https://github.com/samswen/lambda-emfiles when the fix from #3285 is out.

hi @adrai,

3.74.0 has been released, which includes a fix based on your screenshot.

In my test lambda using the code snippet with the looped .send(new GetObjectCommand(...)) you gave, and wrapped in https://github.com/samswen/lambda-emfiles, the comparison of 3.72.0 and 3.74.0 shows a greatly reduced emfile count, from about 1 per loop iteration to a total of 50, which I believe is consistent with the default maxSockets value in the node http handler config.

The question is: Why was this workaround NOT necessary in <= v3.46.0 ?

@Muthuveerappanv we’ve reduced readFile calls in https://github.com/aws/aws-sdk-js-v3/pull/3285 which was released in v3.51.0.

Can you test with >3.51.0 and share your results?

best advice from my side is really to stay at 3.46 and/or set a lower socketTimeout

I’m facing the same issue as well, I was doing 10000 dynamodb updates in 1 chunk and benchmarked the performance with 3.33.x and with 3.49 ran into this EMFILE issues, only option was to reduced the chunk size to 800 (even 1000 didn’t work), or rollback to 3.46

For many in production (like me) this is a breaking change. Could we have some updates soon?

@adrai thank your for the detailed description