apm-agent-nodejs: Socket hang up with ExpressJS

I’ve setup my server with elastic-apm-node, and it run ok. But after every period of time (30-60 min), it causes an error like below:

Error: socket hang up
    at createHangUpError (_http_client.js:331:15)
    at Socket.socketOnEnd (_http_client.js:423:23)
    at emitNone (events.js:111:20)
    at Socket.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1055:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickDomainCallback (internal/process/next_tick.js:218:9)

This error made my server down too, so I need to restart it. After removed the code, everthing okay now. This is my config code:

require('elastic-apm-node').start({
  serviceName: '...',
  secretToken: '...',
  instrument: true,
  captureBody: 'all',
  errorOnAbortedRequests: true,
  serverUrl: 'http://...:8200',
  active: process.env.NODE_ENV === 'production'
})

Version

Elasticsearch, Kibana, APM v6.2.1
NodeJS v8.9.4
ExpressJS v4.15.2

I searched for this error, but there’s no results with apm-server. Is this a known issue, or my configuration miss something? Please help me to fix this. Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 31 (13 by maintainers)

Most upvoted comments

To chime in here: We were seeing the same Socket hang up error on version 1.1.1, running a hapi server. After updating to version 1.2.1, we have not seen the error again. Mind you, we have cut down our sample rate to 0.5.

For reference purposes, here is our config:

import Config from "./config/configuration";

import * as APM from "elastic-apm-node";
APM.start({
    active: Config.get("/elasticSearch/apm/active"),
    serviceName: Config.get("/elasticSearch/apm/serviceName"),
    serverUrl: Config.get("/elasticSearch/apm/url"),
    transactionSampleRate: 0.5
});

We will eventually try to up the transactionSampleRate back up to the default 100%, but are not setup to make the change safely at this point. Will follow up once we properly test.

I had the same problem. I configured APM Server with these values and I solved it.

apm-server.yml:

apm-server:
  host: 0.0.0.0:8200
  
output.elasticsearch:
  hosts: ['elasticsearch:9200']

After a lot of research into this, we think this is related to an issue with the APM Server that occurs if it’s overloaded. In that case the TCP sockets can be kept open because the server hangs on processing the data that the agent is sending to it. Since the agent by default makes a new request to the server every 10 seconds, this might result in a lot of open sockets, which could eat up all the memory. At least that’s what we’ve been able to reproduce.

Because of this we’re introducing a new config option serverTimeout. This will default to 30 seconds of inactivity, before the socket is terminated: #238

We’re also changing the default value of maxQueueSize to 100: #270

We’re of course also working on fixing the issue in the APM Server that causes this. And in general working on reducing the resources used by both the APM Server and the agent.

If this is indeed the issue you’re seeing, I think your best solution at the moment is to set a low maxQueueSize and to deploy more APM Servers behind a load balancer to be able to handle the load.

Let me know if this would be an acceptable solution for you and if it fixes the issue.