sentry-javascript: Memory Leak with `includeLocalVariables` v7.46.0

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using? If you use the CDN bundles, please specify the exact bundle (e.g. bundle.tracing.min.js) in your SDK setup.

@sentry/node

SDK Version

7.46.0

SDK Setup

Sentry.init({
  dsn: __YOU_DSN__,
  includeLocalVariables: true,
});

Steps to Reproduce

  1. Start Sentry with includeLocalVariables set to true.

Expected Result

  1. Server does not run out of memory.

Actual Result

  1. Server runs out of memory.

The memory leak in the LocalVariables integration first documented in #7230 persists after the most recent update that aimed to fix the memory leak.

Screenshot 2023-03-31 at 5 12 00 PM

Proof that we are using Sentry v7.46.0: This is the shell output from immediately after taking the screenshot above:

image

Notably, there was also an uptick in CPU utilization after re-enabling includLocalVairables with v7.46.0 (at the marker below). This surprises me because no Sentry events were sent during that entire period, yet it was constantly using CPU and memory resources. I thought the integration works by using Chrome’s DevTools Protocol to capture the local variables when errors are thrown, not constantly. Perhaps this impact is the impact of merely listening for any thrown errors.

image

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 18 (3 by maintainers)

Most upvoted comments

@timfish @AbhiPrasad Hey, guys! I upgraded our backend to Node.js v19, and the memory leak and curious CPU consumption from Node.js v16 are both gone! This is with both includeLocalVariables: true and LocalVariables({ captureAllExceptions: true }). Thank you!

Screenshot 2023-04-08 at 4 41 50 PM

For reference, I did actually open an issue with Node.

@timfish Unfortunately, we cannot upgrade to Node v18 because our cloud provider is Render, whose Node image uses an older version of GLIBC that requires Node v16, and one of our dependencies uses GLIBC. Hence, upgrading to Node v18 would require us to change our cloud provider or create a custom Docker image, neither of which is worth the hassle.

We’ve now updated the docs + integration to make this only work on Node 18+, since it seems AsyncLocalStorage also seems to have memory leak issues with the inspector.

This means we can’t solve this for Node 16 - since the issue is actually with the Node standard libraries having bugs 😦 @DannyNemer

I’ve managed to reproduce a leak with the following:

  • Node v16.20.0
  • Express v4.17.1
  • @sentry/node v7.46.0
const express = require("express");
const Sentry = require("@sentry/node");

const app = express();

Sentry.init({
  dsn: "__SOME_VALID_DSN__",
  includeLocalVariables: true,
});

setInterval(() => {
  // print the memory usage every second
  console.log(process.memoryUsage().rss / 1024 / 1024);
}, 1000);

app.use(Sentry.Handlers.requestHandler());

async function random() {
  return new Promise((resolve) => {
    setImmediate(() => {
      resolve(Math.random());
    });
  });
}

app.get("/async", async function (req, res) {
  res.send("Ok" + (await random()));
});

app.get("/sync", async function (req, res) {
  res.send("Ok" + Math.random());
});

app.listen(3001, function () {
  console.log("Web server listening on port 3001");
});

And then load testing via autocannon http://localhost:3001/async

The memory grows by roughly 1GB every time the load testing is run and doesn’t appear to be released.

The leaking can be fixed (ie. memory usage stays well below 200MB) if I do any ONE of the following

  • Upgrade to Node v18.15.0
  • Set includeLocalVariables: false
  • Comment out app.use(Sentry.Handlers.requestHandler())
  • Call /sync instead of /async

I captured a heap snapshot after briefly smashing the /async route, and it looks like the domains/promises are getting leaked:

image

My first guess would be that there’s some kind of Node.js bug where domains are leaked when using the inspector API.

@DannyNemer which version of Node are you using?

@timfish Node v16.20.0

@timfish Could we test what happens when we create multiple promises (add event loop backpressure)?

@timfish There was only one “real” request during that entire graph (the spike at around 3:00 PM). There is about one ping and health check for other services every few seconds. Also, we use zero console APIs. So there was barely any activity at all on the server.