got: Too many outgoing connections causes SNAT Port Exhaustion

Question

  • Got version: 9.6.0
  • Node.js version: 10.16-alpine
  • OS & version: Azure App Service Linux

We are running a NodeJS app in Azure. During high load we get high response times due to SNAT Port Exhaustion. We seem to be trying to do more than 1000 simultaneous outgoing requests. I have tried to limit this by setting the maxSockets in the HTTPS Agent to 160 and enabling HTTP KeepAlive, which I thought would limit the no of outbound connections to 160. See code below.

Things I’m wondering about:

  1. Am I doing it wrong? Is there any other way re-use connections and limit the amount of connections? Or do I need to set the Agent per request?
  2. Does the no of threads affect this? I have updated the UV_THREADPOOL_SIZE to 128 and I’m thinking this should not affect this behaviour. Or is the limit somehow per thread?
  3. Is there any way for me to log the number of outgoing connections?

Code to reproduce

  const agentOptions = {
    maxSockets: 160,
    maxFreeSockets: 10,
    keepAlive: true,
    timeout: 30000
  }
  const httpAgent = new http.Agent(agentOptions)
  const httpsAgent = new https.Agent(agentOptions)

  const gotClient = got.extend({
    headers: headers,
    agent: {
      http: httpAgent,
      https: httpsAgent
    }
  })

  const client = {
    get: async (url = '') => {
      const response = await gotClient.get(url, {
        headers,
        json: true
      })
      return response.body
    }
   ...
}
  • I have read the documentation.
  • I have tried my code with the latest version of Node.js and Got.

I see that I’m not running the latest versions of Node and Got, but my questions are more general and should be valid for older versions as well.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 26

Most upvoted comments

Just made a donation over Paypal. Thanks again for your help!

I found the problem!!! The problem had nothing to do with the Got-library or Azure - it was a bug in our code. The code snippet you wrote (I had to change socket.once('end'...) to socket.once('close'...) to make it work) helped me realise that we weren’t reusing the connections, we weren’t even reusing the same instance of the Got client!? In fact, we were creating a new instance of the Got library for every new connection 😿. This bug had been there since way before I started on this project, so I’m surprised we didn’t get any problems until now.

I would probably have wasted (even more) days on this without your fantastic support @szmarczak. Is there anything I can do to show my appreciation? I am truly thankful for you taking the time to help a random software developer with his problems.

This issue was very helpful while debugging why each request was opening up a new connection for my Azure App Service. Thank you for your work here!

I was using https://github.com/node-modules/agentkeepalive and by using that library’s getCurrentStatus method, I figured out that the destination server didn’t have support for keepAlive because the open and closed socket counts matched with no free sockets left open after my load tests. Swapping in a node http global agent with that code above also showed that connections couldn’t stay open and they were being closed. Other known servers supporting keepalive were producing expected results with connection pooling.

I then used curl -Iv to confirm and sure enough the connection was being closed each time.

debugging this

https://github.com/GoogleChromeLabs/ndb is a great tool, I strongly recommend it 😃

Somehow it seems the MaxSockets setting is not being listened to?

Try something like this: https://runkit.com/szmarczak/5e5e64d2c38f7e0013896198

You can become a sponsor

Your issue made me wonder whether it’s a Got bug or not, as your code example is totally valid. It means that you have read the documentation carefully. Some people tick “I have read the documentation” even though they had not. Issues can be of a good quality too, so it’s best to provide as much information as possible.

The first one is fine, you don’t have to do the latter, that’s the point of the custom Got instances 😃

make sure you’re replying with connection: keep-alive and not with connection: close