next.js: Next >12.0.1 causes memory leak after upgrading from Next 11

Verify canary release

  • I verified that the issue exists in Next.js canary release

Provide environment information

$ next info
(node:39290) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

What browser are you using? (if relevant)

Chrome 100

How are you deploying your application? (if relevant)

custom express server running on GCP distributed across multiple containers running 1GB memory

Describe the Bug

We’re in a bit of a catch22 because we tried to upgrade to 12.0.0, which works, but then our Lighthouse scores started failing due to a security vulnerability in 12.0.0. I don’t know if y’all ever fixed that but something was done in 12.0.5 it seems based on the change log, so we tried that, which created another issue: memory leaks.

After upgrading to 12.0.5 a new issue occurred where the prod optimized build app simply crashes due to OOM errors:

<--- Last few GCs --->
12:22:59.957 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver [27:0x7f24d13882c0]   755827 ms: Mark-sweep (reduce) 770.6 (773.1) -> 770.3 (774.9) MB, 998.9 / 0.0 ms  (average mu = 0.100, current mu = 0.001) allocation failure scavenge might not succeed
12:22:59.958 dev-dev-uc1-gke next-4072-webserver [27:0x7f24d13882c0]   756837 ms: Mark-sweep (reduce) 771.3 (776.9) -> 771.1 (777.6) MB, 1009.1 / 0.0 ms  (average mu = 0.052, current mu = 0.001) allocation failure scavenge might not succeed
12:22:59.958 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver <--- JS stacktrace --->
12:22:59.958 dev-dev-uc1-gke next-4072-webserver
12:22:59.958 dev-dev-uc1-gke next-4072-webserver FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
12:23:00.016 dev-dev-uc1-gke next-4072-webserver error Command failed with signal "SIGABRT".

I tried 12.1 and even canary releases to see if anything worked itself out. Same problem. I tracked down a memory leak with growing Promises and Arrays within webpack runtime and cjs-loader and tslib (whatever that means) when building prod build locally and running node --inspect:

image

Screen Shot 2022-05-03 at 1 52 29 PM

In the end, I found 12.0.1 to be the only version after 12.0.0 with no leak; going to 12.0.2 starts the memory leak and crashes the app in the same way.

The prime suspects are the following deps we have installed based on their interaction with promises and arrays:

  • next >=12.0.2
  • node 14.17.6
  • express 4.17.1
  • babel?
  • axios?
  • styled-components 5.3.3
  • react 17.0.2
  • redux
  • typescript 4.3.2

Already tried upgrading node to latest LTS (16), still doesn’t work. Local dev runs “fine” albeit slow, but local builds don’t crash. Only prod optimized builds

Expected Behavior

App is expected to NOT crash, so I upgraded my personal website to 12.1, which is hosted on Vercel and has less dependencies than the enterprise, e-commerce site I am working on. Well, it is fine and running great, so makes me think the issue is more complex possibly around the use of a custom server in the enterprise app and/or Next12 isn’t playing nice with some node deps.

When I ran this in 12.0.0 and 12.0.1 the memory leak is non-existent:

Screen Shot 2022-05-03 at 3 30 33 PM

It is not until 12.0.2 that we experience the memory leak! and I’ve gone through your change logs and nothing looks suspicious.

To Reproduce

Unable to reproduce with custom server next example, so it’s likely because we’re using an enterprise app and something (ie other dependencies) isn’t playing nice with next12 and would like to understand what could be causing that.

I can only assume something around SWC would have changed the build process so when it runs in node it gets wonky? Just speculating at this point because we can’t upgrade to next12 at all.

But in our enterprise app, I simply do the following to reproduce consistently every time:

  • Next >12.0.1
  • yarn build (uses babel, not SWC)
  • NODE_ENV=production node --inspect build/server/index.js

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 6
  • Comments: 26 (17 by maintainers)

Most upvoted comments

Confirmed here early reports are showing that 12.3.0 has fixed the leak our end 🎉

@timneutkens awesome thanks so much for that explanation! I trust that y’all have the right tests in place and it’s hard to know what enterprise, custom apps are doing with your framework. I get it.

Considering that the esmExternals: false change seems to help most people out of this OOM issue, should we add it to docs as a last resort “backwards compat” config change? For example, in Next12 upgrade guide:

“if you experience [this OOM error] first try setting esmExternals: false since it was turned on by default in Next12. Doing this will…”?

What do you think?

I’d prefer to fix the OOM even if it’s not specifically in Next.js through a combination of deps. That flag will definitely go away in the future. Main problem so far is that there is no reproduction provided that would allow us to spend time on it as I’d be happy to have someone on our team look at it if one is provided.

I’ve spent so much time already, so I may as well work on a solid reproduction outside a large enterprise app 😅 The problem is I don’t know what in our npm spaghetti is causing the conflict with Next12, so hard to reproduce with only a handful of deps in an isolated sandbox. Will try my best…

That’d be great! 🙏

given the amount of mem leaks people are running into

As far as we’ve seen on the other issues the webpack config was customized through next-transpile-modules which is not officially supported as it’s a community customization of webpack config, could you confirm you’re also using next-transpile-modules? If you have a reproduction that’d be great given that the other issues don’t have a reproduction provided which makes it impossible to investigate further, so far my suspicion is that next-transpile-modules has a memory leak but we couldn’t reproduce it in a new app so it might be related to specific combination of imports.

We’re planning to build something like next-transpile-modules into Next.js to have an official supported way to do something like that: #35150

Y’all said Next12 is “backwards compatible” but it seems that is not the case with enabling by default instead of opt-in:

It is backwards compatible. It passes all integration tests and we test Next.js against the internal apps that Vercel has like the vercel.com app. These are among the largest apps we’ve seen to date. What you can’t expect from customizing webpack config is that your customization of the build pipeline is backwards compatible between versions as you get full control over the webpack config.

What are the tradeoffs with disabling esmExternals?

Resolving of ESM packages in node_modules will be incorrect / not work. It won’t resolve packages with "type": "module". More here: https://nextjs.org/blog/next-11-1#es-modules-support

@timneutkens I can start to reproduce what may eventually lead to a OOM server kill in my sandbox env, but it is so small of an app I can’t get it to crash (yet). It makes sense why a large, enterprise site would crash more quickly due to its size and need for more resources.

That being said, the memory retained size trends are the same in heap snapshots between our proprietary enterprise site and my personal sandbox site, where obviously the latter is far less resource intensive. But I have reproduced the same memory leak!

To reproduce, I installed deps based on what I assumed to be the main offenders from preliminary R&D on the enterprise site:

  • custom express server
  • styled components (oh btw, it returns a Promise which is sus)
  • typescript
  • babel (no swc)
  1. checkout https://github.com/jflayhart/website/tree/repro-oom
  2. yarn install
  3. yarn build
  4. yarn start
  5. Use Chrome and go to http://localhost:8080 and use chrome://inspect to open node DevTools
  6. Take heap snapshot
  7. refresh http://localhost:8080 a ton of times (eg for mac users, hold down cmd+r for about 30s)
  8. Take heap snapshot
  9. repeat steps 7-8 and even wait longer holding down the refresh hotkey to rapid refresh the site
  10. You will see some constructors grow over time; Promises almost exponentially

At its core, my sandboxed env is barebones version of our enterprise app, where the memory leak is mainly around Promises being held in the global namespace:

Screen Shot 2022-05-05 at 3 16 08 PM Screen Shot 2022-05-05 at 3 16 22 PM Screen Shot 2022-05-05 at 3 16 59 PM

You can imagine this is orders of magnitude worse for an enterprise app where, assuming the culprit is styled-components, there are thousands more lines of css-in-jss.

This is all a working theory, but it is fact that Promises and other constructors are retaining memory over time (see promises start growing up to the top in the last screenshot):

Screen Shot 2022-05-05 at 3 24 11 PM Screen Shot 2022-05-05 at 3 24 24 PM

Could this mean something is wrong with _document’s initialProps usage with styled-component? My working theory is this is part of the issue

UPDATE: Setting esmExternals: false on sandbox project still shows growing retained memory size, but at a MUCH slower rate and after a while most of it gets GCd correctly: Screen Shot 2022-05-06 at 12 49 30 PM

I was lurking on here and want to confirm that switching to canary seems to have gotten rid of the memory leak in our enterprise-grade app as well! 🎉

not much additional info to provide, but we had to downgrade from 12.1.4 to 12.0.7 to fix a memory leak crash in the node process. can say that it was unrelated to which compiler we used, tried both. 12.0.7 not having the issue though

I was lurking on here and want to confirm that switching to canary seems to have gotten rid of the memory leak in our enterprise-grade app as well! 🎉

Follow up to my lurk, turns out the memory leak just changed 😞 We are using next in 2 repos: one with a lot of files and one with LONG files. Prior to the canary, long files had the memory leak and lots of files did not, whereas with the canary our repo with long files is now fine and our lots of files repo is getting OOM. I know this is a vague report but wanted to pass this insight along before we file an issue through vercel!

Thanks for checking. I would say the memory issue is unlikely to be middleware related, but happy to get a 100% confirmation. 👍

@jflayhart could you check canary as well?

@balazsorban44 seems to have fixed the issue! I’ll check later for a 100% confirm, because I had to disable middleware because of breaking changes in the latest canary. If the memory leak was because of the middleware, it might not have been fixed.

Hi, we recently landed some changes (https://github.com/vercel/next.js/pull/37397) to canary that might help fix this issue, without setting esmExternals: false.

Please try it out by installing next@canary and let us know!

It seemed to help a lot of people in this thread: https://github.com/vercel/next.js/issues/32314