apm-agent-nodejs: Socket hang up with ExpressJS
I’ve setup my server with elastic-apm-node
, and it run ok. But after every period of time (30-60 min), it causes an error like below:
Error: socket hang up
at createHangUpError (_http_client.js:331:15)
at Socket.socketOnEnd (_http_client.js:423:23)
at emitNone (events.js:111:20)
at Socket.emit (events.js:208:7)
at endReadableNT (_stream_readable.js:1055:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickDomainCallback (internal/process/next_tick.js:218:9)
This error made my server down too, so I need to restart it. After removed the code, everthing okay now. This is my config code:
require('elastic-apm-node').start({
serviceName: '...',
secretToken: '...',
instrument: true,
captureBody: 'all',
errorOnAbortedRequests: true,
serverUrl: 'http://...:8200',
active: process.env.NODE_ENV === 'production'
})
Version
Elasticsearch, Kibana, APM v6.2.1
NodeJS v8.9.4
ExpressJS v4.15.2
I searched for this error, but there’s no results with apm-server. Is this a known issue, or my configuration miss something? Please help me to fix this. Thanks!
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 31 (13 by maintainers)
To chime in here: We were seeing the same
Socket hang up
error on version 1.1.1, running ahapi
server. After updating to version 1.2.1, we have not seen the error again. Mind you, we have cut down our sample rate to 0.5.For reference purposes, here is our config:
We will eventually try to up the
transactionSampleRate
back up to the default 100%, but are not setup to make the change safely at this point. Will follow up once we properly test.I had the same problem. I configured APM Server with these values and I solved it.
apm-server.yml:
After a lot of research into this, we think this is related to an issue with the APM Server that occurs if it’s overloaded. In that case the TCP sockets can be kept open because the server hangs on processing the data that the agent is sending to it. Since the agent by default makes a new request to the server every 10 seconds, this might result in a lot of open sockets, which could eat up all the memory. At least that’s what we’ve been able to reproduce.
Because of this we’re introducing a new config option
serverTimeout
. This will default to 30 seconds of inactivity, before the socket is terminated: #238We’re also changing the default value of
maxQueueSize
to100
: #270We’re of course also working on fixing the issue in the APM Server that causes this. And in general working on reducing the resources used by both the APM Server and the agent.
If this is indeed the issue you’re seeing, I think your best solution at the moment is to set a low
maxQueueSize
and to deploy more APM Servers behind a load balancer to be able to handle the load.Let me know if this would be an acceptable solution for you and if it fixes the issue.