next.js: [NEXT-1314] High memory usage in deployed Next.js project

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.1.0: Sun Oct  9 20:14:30 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T8103
    Binaries:
      Node: 18.15.0
      npm: 9.5.0
      Yarn: 1.22.19
      pnpm: 8.5.0
    Relevant packages:
      next: 13.4.3-canary.1
      eslint-config-next: N/A
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 5.0.4

Which area(s) of Next.js are affected? (leave empty if unsure)

No response

Link to the code that reproduces this issue

https://codesandbox.io/p/github/ProchaLu/next-js-ram-example/

To Reproduce

Describe the Bug

I have been working on a small project to reproduce an issue related to memory usage in Next.js. The project is built using the Next.js canary version 13.4.3-canary.1. It utilizes Next.js with App Router and Server Actions and does not use a database.

The problem arises when deploying the project on different platforms and observing the memory usage behavior. I have deployed the project on multiple platforms for testing purposes, including Vercel and Fly.io.

  • On Vercel: https://next-js-ram-example.vercel.app/ When interacting with the deployed version on Vercel, the project responds as expected. The memory usage remains stable and does not show any significant increase or latency

  • On Fly.io: https://memory-test.fly.dev/ However, when deploying the project on Fly.io, I noticed that the memory usage constantly remains around 220 MB, even during normal usage scenarios

Expected Behavior

I expect the small project to run smoothly without encountering any memory-related issues when deployed on various platforms, including Fly.io. Considering the previous successful deployment on Fly.io, which involved additional resource usage and utilized Next.js 13 with App Router and Server Actions, my anticipation is that the memory usage will remain stable and within acceptable limits.

Fly.io discussion: https://community.fly.io/t/high-memory-usage-in-deployed-next-js-project/12954?u=upleveled

Which browser are you using? (if relevant)

Chrome

How are you deploying your application? (if relevant)

Vercel, fly.io

NEXT-1314

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 95
  • Comments: 131 (36 by maintainers)

Commits related to this issue

Most upvoted comments

@thexpand This is not related to Server Actions. It is a severe memory leak starting from v13.3.5-canary.9. I was going to open a bug but found this one.

@shuding I suspect your PR https://github.com/vercel/next.js/pull/49116 as others in mentioned canary are not likely to cause this. Can you please take a look? This blocks us from upgrading to the latest Next.js.

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020
    Binaries:
      Node: 18.15.0
      npm: 9.5.0
      Yarn: 1.22.19
      pnpm: 7.9.0
    Relevant packages:
      next: 13.4.4-canary.0
      eslint-config-next: N/A
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 4.6.2

Tech Stack:

  • Rush.js monorepo (on pnpm)
  • +several Next.js apps in SSG (no SSR, no app dir but with next/image (not legacy one) and with middleware.js and few redirects and rewrites in next.config.js)
  • +all of which uses latest (v7) common/storybook as 90% of theirs code base (so transpileModules is used for it)
  • +external configuration and MaterialUI styles comes from Apollo Client used in getStaticProps pointing towards Strapi GraphQL (so on-demand revalidation is used)
  • and everything deployed to our Kubernetes (build and run-time both are using latest Bullseye Debian so that sharp for next/image is working correctly) and monitored in Grafana
  • BTW, Next.js is in standalone mode

Proofs:

13.3.5-canary.8 vs 13.3.5-canary.9

alt

13.3.5-canary.8 vs 13.3.5-canary.9 with all images unoptimized

alt

13.3.5-canary.8 vs 13.4.4-canary.0 (to test latest canary) with all images unoptimized + middleware removed

alt

So, as you can see, the leak comes not from next/image or middleware, and the only PR which theoretically could cause this from canary.9 is this as for me: https://github.com/vercel/next.js/pull/49116

P.S. I also checked13.3.4 and found no leakage there. But on this version, we get Internal Server Error from middleware so can’t use it, so I had to find a minimum canary version where this problem has been fixed - and this version is https://github.com/vercel/next.js/releases/tag/v13.3.5-canary.2, so we lock on this version for now (probably this PR https://github.com/vercel/next.js/pull/48723 fixed middleware problem)

Hey all, I have a small update, we’ve published v13.4.10 that holds some improvements to memory usage increasing.

Besides that release I spent all day wednesday/thursday/friday and most of my weekend now too on investigating the Resident Set increasing. The problem with investigating Resident Set increasing outside of the heap increasing is that there is no tooling built into Node.js/V8 to observe what gets assigned as part of the Resident Set. The heap, which is observable through the built-in tooling is only a smaller portion, and that one is not increasing in this case. If you’re curious this article explains it quite well.

Because of that limitation you’re basically down to these instructions, but they’re not exactly relevant for when V8 does clean up all memory on exit. So I used the massif tool instead which track across all allocations. However it’s unclear when the memory gets cleaned up while profiling so I could only get a list of all memory allocations, which is a bit to broad to investigate this particular issue. Unfortunately I haven’t been able to find a definitive reason using these tools, so I went with a narrower debugging method… commenting out code and early returning in a bunch of places. However using that method I haven’t been able to distinguish the exact place, and as I said earlier in some cases V8 will hold onto i.e. code compilation in memory as part of optimization. I’ve reached out to some folks I know that might have done this kind of memory profiling before as the increase is so subtle.

In the process I’ve sent a PR to the vscode extension that adds a monitor to highlight Resident Set and Total Heap, which is useful for observing these values during a debug session (i.e. when using the JS debug terminal in vscode): https://github.com/microsoft/vscode-js-profile-visualizer/issues/62

While looking into the individual reports in this thread I noticed something that is worth digging into deeper and that we can clearly optimize, which is the spawning of multiple processes. First, some background on the “why even have multiple processes”: When introducing the App Router we had to solve the problem of multiple version of React being used. When you use the App Router the application renders using React Canaries instead of the version of React you manually installed, the reason this is not “just the version I installed” is that React Server Components have a separate package that holds the renderer and infrastructure plumbing code in order to wire up React Server Components. The version of React and the version of the plumbing package have to match 1:1. Then there is the case of enabling Server Actions, in that case React Experimental is used. Server Actions are in Alpha as of right now and leverage new React features like <form action={(formData: FormData) => {}}> and Async Transitions. In order to not break existing applications using Pages Router we set up two rendering workers: one for Pages Router and one for App Router. They run with different instrumentation to resolve the correct version of React. “why don’t you use a require hook instead?” we started with that approach but it wasn’t feasible because of ESM, as require hooks don’t apply to ESM. You can hook into the ESM resolving using a “loader” but those need to be provided when booting the process, as you can see that doesn’t work when there is a single process, so we boot separate processes instead.

This covers 2 processes, but there are actually 4 being booted currently. Let’s dig into the other two:

  • Your server.js or (i.e. when using output: 'standalone') or next start
  • Routing

The server process is straightforward so it doesn’t really need an explanation, you basically run node ./next/standalone/server.js (or next start) and that is booting a process. However, this process doesn’t actually do anything when running the application in production. In development it does a bunch of work like keeping track of .env changes and keeping track of memory usage and rebooting when it gets high.

The routing process runs the http server that handles incoming requests, it handles the routing to one of the rendering workers (i.e. when requesting a App Router or Pages Router route), it serves static files, and it handles the ISR cache.

Notably these processes take around ~50MB in Resident Set Size (RSS) when booted, even when they don’t do any work and the Heap is ~6MB. The math around this is quite simple, the baseline is 200MB. This clarifies why it fails on i.e. fly.io’s OOM killer process quite quickly as it seems to have a limit of ~330MB for the total process. There doesn’t actually have to be a memory leak to get to 330MB when the baseline is 200MB, because a single rendering worker that handles rendering can increase in peak memory usage quite a bit as it can handle many concurrent requests. This is excluding all the application code that has to be loaded into memory even.

So with all of this context, you might have noticed there’s some optimizations to be made to the process usage, especially for the case where you only use App Router or only use Pages Router. We can skip creating the render worker for Pages Router when there are no routes that needs Pages Router rendering (and the same goes for App router ofcourse).

The other optimization is that the “start” processes is entirely idle and we don’t really need to have that, we can combine the “start” process and the routing process.

With those changes we can get the baseline down to 2 processes:

  • One for routing
  • One for rendering (when you only have App Router, or only have Pages Router. If you have both you’d get down to 3 processes instead)

Which should account for about a 50% reduction is baseline memory usage.

Hey I’ve investigated the report from @broksonic21 this morning. Found that there is a memory leak in Node.js when an AbortController signal is passed to fetch() (which we do in Next.js internally). However, that memory leak is already fixed on the latest version of Node.js 18 (18.17.0). PR on undici (Node.js’ fetch implementation): https://github.com/nodejs/undici/pull/2049. It specifically happens when you’re on canary and on a version before 18.17.0, when you upgrade Node.js the memory usage goes back down and the Resident Set Size decreases correctly.

I’ve started a PR to remove the extra process in front of the routing worker as it essentially doubled memory usage in the case @broksonic21 reported. This brings the amount of workers from 4 to 3: https://github.com/vercel/next.js/pull/53523. @shuding is working on avoiding the extra Pages Router renderer when you only use App Router (and vice-versa).

TLDR of the above:

  • Make sure to use Node.js 18.17.0 or later (currently that’s the latest stable version)
  • The report from @broksonic21 only happened on canary because we introduced support for aborting requests while streaming (#52281) and there was a memory leak in Node.js core (undici).
  • I’m working on a PR to remove one of the processes

For every commenter here there’s probably hundreds of others out there wondering why their NextJS app is running out of memory all the time. I cannot find out if this issue is being taken seriously by the team, even though @leerob and @timneutkens reacted here. I can’t find other popular threads about this topic, so this one is it. Any news?

We had to downgrade all apps to 13.3.4 (whether they used app router or pages router) until a fix is available. i’d be happy to share memory graphs but there’s many of them here already looking very familiar.

👋 I’m back with an update. Yesterday we landed #53523 which removes on of the four processes, specifically one that would increase in memory usage under load, it would be garbage collected, but this just reduces memory usage overall. You can see the results on the description of #53523. Please give it a try using npm install next@canary.

Important: in order to make that change I had to simplify the code a lot, which included removing the checks for experimental.appDir so setting it to false will no longer have an effect. I’m not expecting that to be a problem though as the idle process if you only have app (or only have pages) is only ~30-35MB. We’re still working on removing that idle process too (running into some test failures).

Thanks @karlhorky, we’ve been looking at that application yesterday. Found that the heap is not leaking memory in that application, it’s garbage collected correctly. However we did observe the resident set size (RSS) increasing and decreasing to a certain level, that doesn’t have to indicate there’s a memory leak because V8 will retain memory for the process after a spike e.g. when you do a lot of requests there’s more memory assigned and that’s not given back immediately. We’ll continue looking into that but the normal tools (e.g. heap snapshots) don’t allow inspecting this particular case.

@Toursslivers I don’t appreciate your insulting comments to me or colleagues questioning competence, especially using an anonymous account. You’re not contributing anything to this issue with your posts. As said earlier any posts that does not include further information will be hidden.

Hey all, we’ve spent the past week looking into memory usage on and off for development mode besides the performance work @shuding and I have been doing on #48748. We’ve already made some improvements to memory usage in development that might help in production too, but haven’t been able to verify these against a production application.

Looking at this issue there has only been one report that included source code that we could run, and after running it many times in various combinations of requests (including running 90K requests against it) we weren’t able to see significant heap usage and it would always get garbage collected back to an acceptable level (20mb or so). The repo we used is this one: ProchaLu/next-js-ram-example. We’ve tried the create-next-app suggestion from earlier comments too and it’s the same there, memory usage stays the same across ~90K requests.

In order to investigate further we’ll need the source code for an application that reproduces a leak (continuous increase of the heap, not the max) for other applications in order to investigate this further, can anyone provide that?

Few things to keep in mind:

  • By default when the Next.js server boots not all code is loaded, route code is loaded as it is needed, this increases memory usage (as expected) when you open a route that hasn’t been loaded on the server yet. On the other side of that the memory usage is much less when booting up and you don’t have to load all possible code when there’s no requests for certain routes to that particular instance (e.g. when load balancing).
  • When App Router is enabled Next.js uses a new setup with multiple processes: one for routing, one for rendering App, one for rendering Pages. This inherently means slightly more memory usage, but based on the heap we looked at it’s about ~7MB of heap when idle (no rendering work).
  • Please make sure to try next@latest (As of writing that is 13.4.9), this is the version we ran while investigating.

In order to keep this issue down to relevant details only I’m going to hide further comments that do not provide source code / otherwise useful data (i.e heap profiles), we did the same for #48748 which helped keep the issue on track 🙏

We’re going to keep this issue open to track production memory usage. If you’re running into issues with memory usage in development please follow this issue instead: #46756 (similar to this one, please provide the source code or a heap profile).

13.3.5-13.4.9, haven’t fixed the memory leak? Do you understand nextjs?

Not sure if this is next’s supported solution, so YMMV, but only thing that helped us on 13.4.4+ was to set:

  experimental: {
    appDir: false 
    // this also controls running workers https://github.com/vercel/next.js/issues/45508#issuecomment-1597087133, which is causing
    // memory issues in 13.4.4 so until that's fixed, we don't want this.
  },

That disables the new appdir support which became the default in 13.4, but also turns off the extra workers. It also fixed the leaked socket issue calling crashes/timeout issue (https://github.com/vercel/next.js/issues/51560 ), which appears related - the extra processes (see https://github.com/vercel/next.js/issues/45508 for build, but also next start) are leaking as far as I can tell, causing everyone’s memory issues. Might not be exact cause, but highly correlated for sure.

@nmengual we did a bunch more investigating into the Resident Set Size (RSS) increasing. We used valgrind and lldb and found that the increase in memory usage is is based on what the max-old-space-size and other memory options are set on Node.js. Particularly there is about 50% of empty memory assigned regardless or it being used, we’re assuming this is related to V8 pre-allocating a certain amount of memory to cover increases in heap usage. At this

This screenshot from @shuding shows the memory assigned:

CleanShot 2023-07-17 at 20 38 47@2x

I also asked @jridgewell to take a look, he created this document that shows the Resident Set and Heap increase and when garbage collection kicks in: https://docs.google.com/spreadsheets/d/1dSy4eludD8J9BdBWw3dTspvt3OUQNfYJW28gP9TYXoI/edit#gid=0

If you scroll down to the bottom of the document you’ll note that the garbage collector brings memory usage back once there is no additional memory pressure.

At this point we are fairly certain there is no memory leak for production in the latest version of Next.js while running the production server.

We’re still working on reducing the amount of processes to lower the baseline. That will be landed soon, this week or next week.

@timneutkens aside from the demo project that @ProchaLu created, two others larger, more real-world projects of ours that have been consistently crashing with OOM (Out of Memory) errors or just timing out to load are below:

  1. Next.js Spring 2023 Example
  2. Security Vulnerabilities Example

Example error message from Fly.io:

Hello! Your “next-js-example-spring-2023-vienna-austria” application hosted on Fly.io crashed because it ran out of memory. Specifically, the instance 9185512f11d283. Adding more RAM to your application might help!

You should know that adding RAM does cost money. Not much, but some. Figure about $5/mo per extra GB. Our docs have complete pricing details.

Here’s the raw log message:

Out of memory: Killed process 285 (node) total-vm:338292kB, anon-rss:48984kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1172kB oom_score_adj:0

Reviewing the data after 2 hours, it seems that now the memory behaves better: Screenshot 2023-07-13 at 11 49 18 PM But if it seems that the memory trend is always going up:

Screenshot 2023-07-13 at 11 52 31 PM

I’m going to leave it for more hours and update the collected data.

@leerob You can’t even distinguish spam, are you a programmer?

I’ve been able to repro this (or something very similar) using the default reproduction template (with default settings). Note: the issue goes away if I turn the experimental appDir setting to false (i.e. turn off appDir)

See #51560 (comment) for more, but copying the details over:

npx create-next-app -e reproduction-template
cd into your folder...
npm run build
npm start

Then in a separate tab, run:

ab -n 100000 -c 100 http://127.0.0.1:3000/

And the memory grows but is never relinquished. I still have:

After a crash:

Even after ctrl-c the server, the memory stays up and is running. I have to kill -9 to kill it.

This is very clear!

This issue is indeed about the App Router (appDir) feature. The feature is supposedly stable (but it’s clear that it isn’t), which is why it was adopted. Turning it off would require rewriting our codebases.

image

@tghpereira I keep going in circles on these issues as it seems no one reads my posts 😕 I’ve mentioned multiple times that before posting on these issues you should use the latest version of Next.js, we’ve made a ton of improvements in the versions after 13.4.5, which is a version from June…

In order to keep this issue down to relevant details only I’m going to hide further comments that do not provide source code / otherwise useful data (i.e heap profiles), we did the same for https://github.com/vercel/next.js/issues/48748 which helped keep the issue on track 🙏 We’re going to keep this issue open to track production memory usage. If you’re running into issues with memory usage in development please follow this issue instead: https://github.com/vercel/next.js/issues/46756 (similar to this one, please provide the source code or a heap profile).

I feel like I’m going in circles on this issue. Every time I post a comment saying we can’t help in any way when screenshots are posted without a reproduction someone posts a screenshot and a comment saying “happens for me”. Once again, if you still see memory usage in your application there is nothing we can do if you don’t provide a reproduction. We have confirmed there is no memory leak in the application code that was shared in this issue. In order to investigate this any further we will need application code. Without running the application there is no way for us to debug these reports. It could be that you’re using a data fetching library, css-in-js solution, React component, or anything else really, that causes increased memory usage / a leak as well, or even your own application code. We’d really like to investigate this further, so please help us investigate your application.

Posts: https://github.com/vercel/next.js/issues/49929#issuecomment-1631206457 https://github.com/vercel/next.js/issues/49929#issuecomment-1637185156 https://github.com/vercel/next.js/issues/49929#issuecomment-1662198738 https://github.com/vercel/next.js/issues/49929#issuecomment-1663865174 https://github.com/vercel/next.js/issues/49929#issuecomment-1670817557

Hey everyone, Got another update on this, we’ve landed the changes to reduce the amount of processes from 3 to 2:

  • One for routing, App Router rendering
  • One for Pages Router rendering (see my previous posts for reasoning why this needs a separate process)

It’s out on next@canary, please give it a try.

We’ve also made a change to the implementation using Sharp to reduce the amount of concurrency it handles (usually it would take all cpus). That should help a bit with peak memory usage when using Image Optimization. I’d like to get a reproduction for the Image Optimization causing high memory usage so that it can be investigated in a new issue so if someone has that please provide it.

With these changes landed I think it’s time to close this issue as these changes cover the majority of comments posted. We can post a new issue specifically tracking memory usage with image optimization. There is a separate issue for development memory usage already.

After updating our next project to 13.4.12, our k8s monitoring app shows signs of improvements. We will continue monitoring it, but for now it seems that the crazy memory consumption as time passes has stopped. Only thing in our next.config.js that has been set is

// next.config.js

experimental: {
	isrMemoryCacheSize: 0,
},

which probably should not affect the result, since it has always been set to 0, even when the memory kept leaking prior to the Next version upgrade.

image

(Every dip in the graph is the app being redeployed)

My 16GB memory is fully utilized by Next.js and I suspect it can use an unlimited amount of memory.

We also have this issue in the latest release 13.4.9. Our Next.js appdir app leaks memory overnight but doesn’t execute any specific tasks other than listening for requests. Every page transition increases memory consumption without releasing it again. After a couple of hours the container is out of memory, crashes and restarts. We didn’t have this in the pages version.

image

Updated my app dir PoC branch to 13.4.8 and deployed:

  • Two 13.4.8 pages dir pods before deploy (yellow and green)
  • Two 13.4.8 app dir pods after deploy (blue and orange)

image

A single request hit one of the app dir pods, got a 404 towards a different backend and spiked from 250MB to 480MB RAM 😰

But even the screenshots might be telling since your team understands the internal memory allocation mechanics much better than any of us does. Maybe you spot some common patterns that we don’t.

Unfortunately it’s not helpful, and I’ve mentioned that multiple times in this issue.

We need to get multiple heap dumps and compare those to see exactly what memory was allocated / where the increase came from / what was not garbage collected. The only way to do that is by connecting the debugger and collecting those heap snapshots. There is currently no way for us to do that reliably, and even then it might be the Resident Set Size just increases because V8 assigns ~50% empty memory for future increases in heap usage as shared in my earlier post: https://github.com/vercel/next.js/issues/49929#issuecomment-1649637524

Highly recommend to try enable splitChunks for development mode - https://github.com/vercel/next.js/issues/48748#issuecomment-1640224812

For one of our projects (not Next.js) memory usage has been reduced from 12gb to 3gb

After a few days of testing:

Both Next js apps were fresh. npx create-next-app + added 3 random static routes which display one html element.

App router was using 160 MiB of memory after a cold start + started to eat up memory. Grew about 250 MiB in 4 days!

Pages router was using 70 MiB of memory after a cold start and is consuming a stable 74 MiB of memory for the past 2 days. image

So the solution for me (hopefully?): Pages router + appDir: false in next.config.js. ( /pages router used Next.js version 13.4.6)

@leerob why was @broksonic21’s example marked as spam? The reproduction steps couldn’t be more clear.

Folks, please read the messages @timneutkens is sending. Further comments will continue to be marked off topic unless provided clear reproduction steps as mentioned. Thank you.

I feel like I’m going in circles on this issue. Every time I post a comment saying we can’t help in any way when screenshots are posted without a reproduction someone posts a screenshot and a comment saying “happens for me”.

Once again, if you still see memory usage in your application there is nothing we can do if you don’t provide a reproduction. We have confirmed there is no memory leak in the application code that was shared in this issue. In order to investigate this any further we will need application code.

Without running the application there is no way for us to debug these reports. It could be that you’re using a data fetching library, css-in-js solution, React component, or anything else really, that causes increased memory usage / a leak as well, or even your own application code.

We’d really like to investigate this further, so please help us investigate your application.

So far we’ve spent ~4 engineering weeks on this particular GitHub issue, with as a result a confirmation there is no leak in the applications that were provided, which essentially means time lost that could have been spent on other issues.

@timneutkens
I used to develop my project with nextjs 13.4.3, but today I noticed that the web page would freeze after being open for a few minutes. I checked the devTools and saw that the CPU usage was maxed out and the memory kept increasing until it filled up and the tab crashed. And then I found this issue and fixed the memory and CPU issue by updating to the newer version 13.4.10.

In the meantime as others have mentioned, please continue using a lower version of Next.js (sounds like some of y’all are seeing things normal in your environment on 13.3.4).

@leerob Should Next.js docs be updated to mention that the app router / these later versions are not yet ready for production use? Or do you think this issue is only a problem if you deploy in a traditional manner (non-serverless)?

It also jumps up to 2x the ram usage after a single request. Example of a pages dir vs app dir deployed app: image

I created a reproduction using Docker to showcase how a Simple project using Next.js crashes when being used in environtments with ~225MB.

Steps to reproduce:

  1. run docker pull josehower/next-js-memory-leak-reproduction-example:latest
  2. run docker run -p 3000:3000 --memory=225m josehower/next-js-memory-leak-reproduction-example:latest
    • NOTE: in some environments the app is not even running with this memory restriction in this case add more memory --memory=256m
  3. visit http://localhost:3000/
  4. click fire
  5. confirm the app is turning unresponsive and throwing the following error
Error: socket hang up
    at connResetException (node:internal/errors:717:14)
    at Socket.socketOnEnd (node:_http_client:526:23)
    at Socket.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1359:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ECONNRESET'
}
  1. you can confirm the app is running when removing the --memory=225m option from the command

The way I recreated this Reproduction was by creating a Docker image from a simple Next.js app using autocannon to fake traffic to the website from a button.

  • Created the reproduction repo
  • Adding a Dockerfile and setup the image
  • building with docker build . -t <username>/<image-name>:<version>
  • publishing to Docker with docker push docker push <username>/<image-name>:<version>

I’ve been monitoring this issue for some time now, but even with the latest version 13.4.12 and several different memory settings the app keeps accumulating objects but never releasing them. In our latest deployment we also set isrMemoryCacheSize=0 as suggested by @SanderRuusmaa, but the overall behavior didn’t change. After reaching 2GB (which is the available memory in the Kubernetes pod), the service restarts.

Unfortunately, our app is already too complex to share it. Right now we try to disable specific parts of it to find the culprit. But it seems that somehow it is tied to the App router since we didn’t experience this issue with the Pages router before. Memory consumption increases especially during page transitions . From the on, old memory gets never released again.

Today: image

During the last 2 weeks: image

On the latest version, disabling image optimization as everyone else has mentioned fixes the high memory usage.

When it’s enabled on production our k8s node immediately crashes whenever a route is loaded due to the high memory usage, increasing the memory on each node doesn’t really help as eventually it reaches the maximum anyways.

Although our project definitely requires image optimization so hoping to see a fix soon 🤞 .

@michielvanderros-tomtom as I mentioned here, we are aware and appreciate the additional information being provided on this issue. Memory issues are notoriously tricky so appreciate your assistance as we work through this together.

In the meantime as others have mentioned, please continue using a lower version of Next.js (sounds like some of y’all are seeing things normal in your environment on 13.3.4).

For folks posting screenshots, please include details about the environment, how you’re hosting Next.js, and ideally a minimal reproduction of your application where you’re still seeing memory issues to narrow down the issue. Please note whether it’s using the Pages Router, App Router, or both together.

Thank you!

Same issue here, running on Next.js 13.4.6 deployed on Fly.io. I worked around the problem by allocating 2048 MiB of memory to the instance and a 512 MiB swap as a buffer. As you can see, I’m only delaying the inevitable OOM, but this at least makes the issue much less frequent.

fly-metrics net_d_fly-app_fly-app_orgId=96146 var-app=arewepomeloyet-stg from=now-2d to=now (1)

You can find the source code here: https://github.com/hampuskraft/arewepomeloyet.com.

I created a different reproduction repo using the latest canary version of Next.js. The error is crashing the dev server when an import is missing.

https://nextjs.org/docs/messages/module-not-found

<--- Last few GCs --->

[2218:0x5eb9a70]    40167 ms: Mark-sweep 252.1 (263.9) -> 250.1 (263.7) MB, 206.0 / 0.0 ms  (average mu = 0.174, current mu = 0.125) allocation failure scavenge might not succeed
[2218:0x5eb9a70]    40404 ms: Mark-sweep 252.4 (263.9) -> 250.6 (264.2) MB, 216.7 / 0.0 ms  (average mu = 0.135, current mu = 0.086) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb02930 node::Abort() [/usr/local/bin/node]
 2: 0xa18149 node::FatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0xcdd16e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0xcdd4e7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0xe94b55  [/usr/local/bin/node]
 6: 0xe95636  [/usr/local/bin/node]
 7: 0xea3b5e  [/usr/local/bin/node]
 8: 0xea45a0 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 9: 0xea751e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0xe68a5a v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
11: 0x11e17c6 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0x15d5439  [/usr/local/bin/node]

I documented this in a new issue since it seems a different error https://github.com/vercel/next.js/issues/51025

@tghpereira again, please read my earlier posts…

If you’re running into issues with memory usage in development please follow this issue instead: https://github.com/vercel/next.js/issues/46756 (similar to this one, please provide the source code or a heap profile).

I managed to create a repository https://github.com/ungarida/memory-leak that reproduces the memory leak discussed in my previous comment.

Invoking the http://localhost:3000/ every second and taking snaphostsof the memory, the following screenshot shows the constant memory growth:

Screenshot 2023-08-10 at 16 59 00

@stx-chris @timneutkens - with node v18.17.0 and canary: next@13.4.13-canary.12, I can get through the full ab run - nice! and can confirm the process that was left behind now closes when I kill next start.

Memory usage seems better too -> excited to see your PR land as well as getting rid of the extra renderers.

Even after ctrl-c the server, the memory stays up and is running. I have to kill -9 to kill it.

This has been fixed in #53495.

I can confirm that @broksonic21’s minimal example using canary leads to the described zombie process.

Are you sure you were using v13.4.13-canary.12? Just double checked and the process no longer stays around on that version.

@FairyPenguin please provide a reproduction. It sounds like you’re referring to development though, which we’re tracking here: https://github.com/vercel/next.js/issues/46756.

Also keep in mind that an increase in memory usage when you navigate to a page in development is expected. In development all pages are compiled/cached on-demand when you request them.

Upgraded to Next 13.4.10 and configured { images: { unoptimized: true }}. Memory usage appears normal (less than 300MB without using App Router).

Whereas turning on image optimization causes K8s container restarting twice in past 12 hours, at memory and cpu usage spike point.

image image

There may have something to do with image optimization.

Additional k8s container restart logs:

2 Killing: Container api failed liveness probe, will be restarted
Unhealthy: Liveness probe failed: Get "http://host:8000/basepath": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

After leaving it for several hours (having less traffic) I keep seeing that the memory is increasing and it seems that it is not released: image

I have noticed that if I import the datadog tracer on the custom server, the ram increases much more slowly and we do not have as much information in the traces. On the other hand, if we require the tracer in the start command (node --require ‘tracer.js’ dist/server.js) I see much more information and the ram goes up much faster. In case it may be one of the reasons. I suppose that having less information is because the tracer is not attached to the workers if it is imported from the custom server.

@timneutkens I think I can confirm what you say, in our production app (using App dir), the Nodejs Runtime metrics seem not to increase over time. image

But instead if we see the pod metrics, if we can see how it always tends to rise (although slowly) and the pod ends up dying and restarting the service. Screenshot 2023-07-13 at 1 18 36 PM

We are using the latest version of nextjs available 13.4.9 and node 18.16.0

If I look at the processes of the pod, I see that the one with the most ram allocated are these processes: /home/node/node_modules/next/dist/compiled/jest-worker/processChild.js

Who can tell me which version can use app dir normally and stably

For now we’ve pinned Next on 13.3.3. Every version after that uses double the memory. Setup; dockerized micro apps on Cloud Run GCP.

+1 happening the same on our systems , no dev in my team with 8Gigs of ram is able to work with it, this is also happening specifically when we are using app router , its kind of painful to switch to different routing definition back and forth. next version 13.4.4

Here are two pods in a k8s-cluster, the two first lines are my pages based branch, vs default memory usage of my app-dir based branch. Literally 10x from start before any requests. The app-dir branch also hits 7-800mb of usage after a while.

image

Is this related to #49677 maybe?

We recently upgrade to 13.4.13 from 13.4.9 and had to immediately rollback for a very evident memory leak. I could not reproduce it creating a project from scratch but I can add the following details, in our project we use: 1- pages and not app 2- webpack and not SWC

I tried to take snapshot of the heap to provide more hints about the problem. What I noticed is that in our project the memory leak, comparing the delta of 2 snapshots, is creating a lot of arrays and in it there is a reference to undici: Screenshot 2023-08-10 at 15 56 11

Creating the project from scratch with npx create-next-app@latest I could not see any reference to undici in memory.

It maybe be a long shot but there is this open issue about a memory leak https://github.com/nodejs/undici/issues/2198 on undici. undici is pre-compiled within nextjs and it is not easy to downgrade it, but maybe is worth to test a previous version of undici.

@timneutkens I am here just to thank you for your patience ❤️ (my app is still on the pages folder, but I am following the progress on solving those issues).

Hey @timneutkens, we started to have a leak with jest (https://github.com/jestjs/jest/issues/11956) and I see that now nextjs uses jest workers, could it be related?

@timneutkens I think react server components are not as good as the previous hooks. nextjs went the wrong way. These 4 weeks are also your bugs, what are you complaining about?

I’ve been able to repro this (or something very similar) using the default reproduction template (with default settings). Note: the issue goes away if I turn the experimental appDir setting to false (i.e. turn off appDir)

See https://github.com/vercel/next.js/issues/51560#issuecomment-1662288523 for more, but copying the details over:

npx create-next-app -e reproduction-template
cd into your folder...
npm run build
npm start

Then in a separate tab, run:

ab -n 100000 -c 100 http://127.0.0.1:3000/

And the memory grows but is never relinquished. I still have:

After a crash:

Even after ctrl-c the server, the memory stays up and is running. I have to kill -9 to kill it.

The same for me disabling image optimisation fixed RAM usage (for now) using

"sharp": "^0.32.3",
"next": "13.4.10",

image

Disabling image optimization has resolved the memory leak for our team.

@OlegLustenko there’s only one routing process that handles all incoming requests and routes them to the right worker.

Testing memory overflow with such a simple project won’t work, you need to use pages and apps together, and add some common libraries to test it properly.

Sharing somethings me and my team tried. We’re having similar issues described here.

TL;DR Next 13 in production with appDir with similar issues as described in this thread. Nothing we’ve tried so far works.


We’re experiencing similar problems in our production environment. We’re using appDir and switching to pages is not an option right now. We’ve got a 4GB memory limit for each pod in our environment which should be more than enough even when using Server Components.

So far we’ve tried.

  • Removing sharp --> The memory usage was not improved, just more “irratic”. Images loaded slower without it so we added the package again. image
  • Setting experimental.appDir = true in next.config.js --> no noticeable difference.
  • Experimenting with react cache on our requests. --> no noticeable difference either.
  • Making sure we clean up when needed in all useEffects.

Last 2 days.

image

  • Most breaks in the graph are restarts due to pods running out of memory, but some are new deploys of our site (the ones not hitting the 4GB limit).

Some relevant packages

	"next": "^13.4.7",
	"next-logger": "^3.0.2",
	"next-seo": "^6.0.0",
	"react": "18.2.0",
	"react-dom": "18.2.0",
	"sharp": "^0.32.1",

Weird. I’m not seeing any improvements with 13.3.3 on my side. This is a WIP project, and has next-auth, middleware and more, have not yet tried to disable any of that to test.

v13.4.7 image

v13.3.3 with experimental: { appDir: true } image

Pinning to version 13.3.3 and setting experimental.appDir: true in next.config.js fixed it for us.

experimental: {
    appDir: true,
  }

Before, running latest: image

After, with next 13.3.3 and Experimental AppDir: image

The spike at 9:30PM was due to the re-deployment of the app.

I also have this problem, about 5 seconds, occupying all the system’s memory, leading to the system hanging up.

CleanShot 2023-06-27 at 5 11 15@2x

My app is using next 13.4.5, page dir, standalone build, runs in PM2

CleanShot 2023-06-27 at 5 13 44@2x

Tried the newest 13.4.7, things look roughly the same:

Here’s another example of a pod that has min: 1 - max: 2 replica, where the green has been alive for a while, where the yellow came up and initially used 300MB, then as soon as a single request hit it it jumped to 520MB.

image

This app isn’t using a single next/image-component. So it’s definitely not related to that.

Here’s the same app in production that’s actually getting a few thousand visits:

image

Usage graph running on railway. June 15th image optimization was disabled. Also cache rate drastically increase — may be related.

Not sure how much of an issue this is in serverless land since processes don’t run long enough to have memory leaks. CleanShot 2023-06-16 at 21 05 52@2x

Similar: https://github.com/vercel/next.js/issues/44685

Thanks for the extra tip about the swap memory - we also tried that as well, after getting that tip in the community post

Long story short upgrading NextJS from 13.4.9 to 13.4.13, in case you use pages as we do, requires also upgrading NodeJS to >=18.17.0. In our case it solved the memory leak issue.

@leerob The nextjs cache has bugs, https://nextjs.org/docs/app/building-your-application/caching#data-cache. It has not been fixed for 4 weeks, do you understand nextjs?

Hi @timneutkens, thanks for your quick reply; totally get your point. But even the screenshots might be telling since your team understands the internal memory allocation mechanics much better than any of us does. Maybe you spot some common patterns that we don’t.

It turns out to be very difficult to analyze memory consumption of a Next application due to its extensive pre-allocations and caching. Maybe a future Chrome plugin might help in this regard. If I find time over the next couple of days I try to reduce the app to the bare minimum and add more and more functionality until - hopefully- the culprit has been detected.

But would it actually be useful to you? If it hits that 500MB+ wall because it’s app-dir and pages in the same app it doesn’t seem like there’s much left to research.

Or at least until I try to go into production with this to actually get some load on this app. I can live with the increased RAM usage if it’s stable. But I’m still blocked by #43704 (and by extent #51160) to be able to go to prod.

The only “funky” thing my app does is inject some HTML in layout.tsx from an external API. But this doesn’t seem to affect the memory usage.

We’ll need to be able to run the application yeah.

This one is sitting at around 250-270MB. But this one is entirely app dir.

Please read my earlier posts on this issue. https://github.com/vercel/next.js/issues/49929#issuecomment-1637185156

@karl-run that could be entirely expected. Please, as I’ve shared before posting screenshots is not helpful for investigating.

When you don’t do a request to a page the code for that page would not be loaded. So if you do a request after deploying and it’s a page with a lot of code there will be a spike in memory usage. Unfortunately we can’t investigate reports with screenshots or monitoring tools.

@mindfocus it’s unclear what you’re saying. You’re linking a comment from someone else.

I just deployed with version v13.4.10-canary.6, in half an hour I will have data to compare.

excatly the same problem here running next 13.4.7 with pages (not a big fan of app dir at this stage) deployed on railway since I found it the cheapest and fastest solution (except Vercel of course, but 20$ for one app is too much)

the app constantly consumes more and more memory up to ~1GB and all I can do is monitor and restart my app from time to time…

image

will try to downgrade to 13.3.4 and monitor the usage

I’m just noticing that even during development on my macbook this dangling worker process problem appears, even with appDir: false:

Screenshot 2023-06-29 at 11 08 48

Screenshot 2023-06-29 at 11 10 30

More interestingly though is that it seems that these workers are only left dangling when the application crashes (i.e. I made a typo while devving and hot-reloading made yarn dev quit with an error).

If I just CTRL+C out of yarn dev, the worker processes it spawned are cleaned up correctly.

Hi *, having similar problems with memory consumption on nextjs(13.4.6) with app router. App is mostly static with low traffic. I see very high memory consumption on railway.

image

Next js 13.4.7 brand new project with 3 random paths ( /test, /another-page etc). The App router was used.

Has been idling in a Kubernetes pod for 3 days. Memory keeps slowly creeping up.

The last line on the graph is the brand new Next project that has been idling for 3 days: image

I already uninstalled sharp and disabled image optimization, it didn’t help in my case.

Try uninstalling sharp, it made memory usage much lower better in my case

Yep I can confirm this seems to be a leak somewhere. I’m running a super basic Next server on Railway and you can see the memory usage at completely idle here:

Screenshot 2023-06-21 at 12 57 52 PM

Here’s a list of packages and versions being used if this helps anyone debug:

`

"@clerk/nextjs": "^4.21.7",

"@types/node": "20.3.1",

"@types/react": "18.2.13",

"@types/react-dom": "18.2.6",

"autoprefixer": "10.4.14",

"eslint": "8.43.0",

"eslint-config-next": "13.4.6",

"next": "13.4.6",

"postcss": "8.4.24",

"react": "18.2.0",

"react-dom": "18.2.0",

"tailwindcss": "3.3.2",

"typescript": "5.1.3"

`

+1 same situation can be seen in our deployed next js app 😕

I created this reproduction repo using the latest canary version of Next.js for the error documented before. In the repo i am using auto-cannon to request multiple pages very fast, simulating traffic to the website.

i documented this in a new issue since seems a different error https://github.com/vercel/next.js/issues/50909