aws-sdk-js-v3: Leaking file descriptors when generating too many http requests
This issue is extracted from this original issue: https://github.com/aws/aws-sdk-js-v3/issues/3019, like requested: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1030267030
#3019 is concentrated to the readFile (leaking file descriptor) “issue” and this issue is concentrated to the network (leaking file descriptor) “issue”
It seems lambda is not “waiting” for the file descriptors to be closed.
This can be observed especially, when having warm lambda executions with a lot of sdk calls, like for DynamoDb or S3, etc… This each http request opens a network socket which results in an open file descriptor. Since by default in Node.js the socket timeout is set to 120000ms (2 minutes) it may be the lambda is already finished, but the sockets are still open. When “restarting” the lambda for the next invocations, those file descriptors may still be open. This leads to this type of EMFILE errors:
Error: connect EMFILE 52.94.5.100:443 - Local (undefined:undefined)
Error: getaddrinfo EMFILE dynamodb.eu-west-1.amazonaws.com
Error: A system error occurred: uv_os_homedir returned EMFILE (too many open files)
These basic tests shows the count (and leaks) of the emfile count:
Tests originally done in this issue here: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1028840006
Details
compared to this tests: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1029287130
Defining a custom requestHandler, with a very low socketTimeout reduces drastically the emfiles count:
requestHandler: new NodeHttpHandler({
socketTimeout: 10000 // <- this decreases the emfiles count, the Node.js default is 120000
})
Details
That’s why I suggest to set a low socket Timeout by default, like proposed here: https://github.com/aws/aws-sdk-js-v3/issues/3019#issuecomment-1029713958
proposal:
// https://github.com/aws/aws-sdk-js-v3/blob/main/packages/node-http-handler/src/node-http-handler.ts#L62
socketTimeout: socketTimeout || 10000
and probably also here?
// https://github.com/aws/aws-sdk-js-v3/blob/main/packages/node-http-handler/src/node-http2-handler.ts#L44
this.requestTimeout = requestTimeout || 10000;
PS. btw. it seems it got worse (more file descriptors) when updating from v3.46.0 to v3.49.0
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 34 (10 by maintainers)
@trivikr I think I found what is causing all this extra open EMFILEs…
It’s probably exactly what @AllanZhengYP commented here: https://github.com/aws/aws-sdk-js-v3/blame/main/packages/node-http-handler/src/node-http-handler.ts#L75
When doing all these hundreds of concurrent requests, the code is not waiting for the this.config to be ready, and will initialize a loooot of new http(s) clients here: https://github.com/aws/aws-sdk-js-v3/blame/main/packages/node-http-handler/src/node-http-handler.ts#L64
All this was introduced in v3.47.0 with this commit: https://github.com/aws/aws-sdk-js-v3/commit/9152e210c6ec29f34bb070eaf2874039022e6ab7
I tested with this little hack, and it seems to work much better like this: // cc: @mcollina you may know of a more performant way to sync up these concurrent calls?
btw: to generate some concurrent requests, it is enough to do something like this:
tldr; the amount of EMFILE generated by readFile is nothing compared to the amount of EMFILE generated by the http requests… that’s why you will not notice a decrease of 10 EMFILEs when there are other hundreds or thousands of EMFILEs caused by the http requests
@alexforsyth @trivikr any update on this? When will we be able to update from v3.46.0 to a newer fixed version?
@AllanZhengYP will take a look in this issue.
I concur, I tried with version 3.52.0
My load test results still hasn’t improved, didn’t pass even for 1000 parallel dynamodb updates (promise.all), not much improvement [had 10000 updates working fine in 3.33.x, now my chunk size is 800] > @Muthuveerappanv we’ve reduced readFile calls in https://github.com/aws/aws-sdk-js-v3/pull/3285 which was released in v3.51.0.
fyi: the tests I did where without extra readFile calls, so these are all coming from http requests. So the tests run with:
That’s why my proposal is to have a very low socketTimeout by default. In my tests setting a low socketTimeout decreased the EMFILE count from up to over 600 to under 100 => 6x.
The fix in https://github.com/aws/aws-sdk-js-v3/pull/3285 is most likely going to fix this issue too, as the number of file reads is reduced from
16
to2
for client+operation call.ToDo: Reproduce this issue using repo https://github.com/samswen/lambda-emfiles when the fix from #3285 is out.
hi @adrai,
3.74.0
has been released, which includes a fix based on your screenshot.In my test lambda using the code snippet with the looped
.send(new GetObjectCommand(...))
you gave, and wrapped in https://github.com/samswen/lambda-emfiles, the comparison of 3.72.0 and 3.74.0 shows a greatly reduced emfile count, from about 1 per loop iteration to a total of 50, which I believe is consistent with the default maxSockets value in the node http handler config.The question is: Why was this workaround NOT necessary in <= v3.46.0 ?
@Muthuveerappanv we’ve reduced readFile calls in https://github.com/aws/aws-sdk-js-v3/pull/3285 which was released in v3.51.0.
Can you test with >3.51.0 and share your results?
best advice from my side is really to stay at 3.46 and/or set a lower socketTimeout
I’m facing the same issue as well, I was doing 10000 dynamodb updates in 1 chunk and benchmarked the performance with 3.33.x and with 3.49 ran into this EMFILE issues, only option was to reduced the chunk size to 800 (even 1000 didn’t work), or rollback to 3.46
For many in production (like me) this is a breaking change. Could we have some updates soon?
@adrai thank your for the detailed description