jest: [Bug]: Memory consumption issues on Node JS 16.11.0+

šŸšØ As of Node 21.1.0, this has been fixed upstream. Hopefully the fixes will be backported to v18 and v20 as well (as of writing (Oct. 26 2023) they have not), but that is up to the Node.js project and nothing we control from here. Note that (native) ESM still has memory leaks - that can be tracked here: #14605. If youā€™re unable to upgrade your version of Node, you can use --workerIdleMemoryLimit in Jest 29 and later. See https://jestjs.io/docs/configuration/#workeridlememorylimit-numberstring šŸšØ

Version

27.0.6

Steps to reproduce

  1. Install the latest Node JS (16.11.0 or later) or use the appropriate Docker image
  2. Set up a project with a multiplicity Jest tests
  3. Run node --expose-gc node_modules/.bin/jest --logHeapUsage and see how the memory consumption starts increasing.

Expected behavior

Since Jest calls global.gc() when Garbage Collector is exposed and --logHeapUsage flag is present, the memory usage should be stable.

Actual behavior

The memory usage increases with every new test

Additional context

We had some issues with Jest workers consuming all available RAM both on CI machine and locally. After doing some research, we found that if we run Jest like the following node --expose-gc node_modules/.bin/jest --logHeapUsage, the heap size remains stable. After upgrading to Node JS v16.11.0, the issue was back. Node v16.10.0 works fine. I believe it was something accidentally introduced in the new Node, but it might be useful to take a look at this from Jest perspective in search of possible workarounds. Iā€™m also having the same behavior on my working machine, environment of which Iā€™m pasting below šŸ‘‡šŸ»

Environment

System:
    OS: macOS 11.6
    CPU: (8) x64 Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  Binaries:
    Node: 16.11.0 - ~/.nvm/versions/node/v16.11.0/bin/node
    Yarn: 1.22.0 - ~/SomeFancyDir/webapp/node_modules/.bin/yarn
    npm: 8.0.0 - ~/.nvm/versions/node/v16.11.0/bin/npm
  npmPackages:
    jest: 27.0.6 => 27.0.6

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 233
  • Comments: 285 (93 by maintainers)

Commits related to this issue

Most upvoted comments

Weā€™ve been recently hit by this bug (or combination of them) in several services of our stack. Itā€™s been an inconvenience for use since once the heap size exceeded the 2GB available by GitHub Actions machines, our CI/CD process was failing 100% of the time.

I believe the ā€œmodules accumulating in the heapā€ bug has been present in Jest for some time, but that in combination with the changes in Node from v16.10 to v16.11 has made it much more severe. Weā€™re certain that bug exists in Jest, as weā€™ve managed to replicate it with a ā€œdummy repoā€, as several other people have stated.

The fact that the ticket created in Node JS was closed as a WONTFIX is a bit worrying. I currently lack the knowledge to judge the discussions which happened there, but if that wasnā€™t a bug at the end probably the solution falls on the Jest side.

After trying to extract as much information from the different GitHub issues that have been created in Jest, TS-Jest and Node, and trying several approaches, the only path for us was to downgrade back to Node v16.10.

These are some of the statistics we gathered in the process:

Node @jest/core isolatedModules runInBand Comp. cache Initial H Final H H ratio Time
17.2.0 27.3.1 true true true 133 MB 2236 MB ~33 MB 144 s
16.13.0 27.3.1 true true true 131 MB 2220 MB ~30 MB 130 s
16.13.0 27.3.1 true true false 126 MB 1712 MB ~25 MB 113 s
16.13.0 27.3.1 false true true 287 MB 2393 MB ~30 MB 140 s
16.10.0 27.3.1 true true true 123 MB 790 MB ~10 MB 77 s
16.10.0 27.3.1 true true false 118 MB 1676 MB ~25 MB 110 s
16.10.0 27.3.1 false true true 280 MB 952 MB ~10 MB 90 s

Some explanations about the columns:

  • Node: version of NodeJS
  • @jest/core: version of the Jest core
  • isolatedModules: the setting defined via jest.config.json
  • runInBand: specified by the --runInBand Jest flag
  • Comp. cache: specified by the NodeJS --no-compilation-cache flag
  • Initial H: heap size after running the first test
  • Final H: heap size after running the last test
  • H ratio: heap size increase after each test
  • Time: time the whole suite needed to run

The most relevant tickets related to this matter:

We hope this issue is given the importance it deserves, as keeping the Node version pinned to v16.10 or relying on bigger CI machines to compensate this is a poor strategy for the future.

Node 20 with backport coming Thursday: https://github.com/nodejs/node/pull/50682

Could a maintainer update the top post with a summary of the current state of whatā€™s known perhaps?

Hereā€™s my attempt at a quick summary, a better one would probably also have links to relevant comments from up-thread.

  • Memory consumption on Node 16.11.0+ is much higher
  • Each jest workerā€™s memory usage is seen to grow over time as it runs test suites, this happens regardless of what the tests themselves actually do
  • This appears to be caused by the v8 upgrade that was contained in the Node upgrade
  • No-one is exactly sure right now what changed in v8 to cause this. Thereā€™s a lot of changes in that v8 bump.
  • Thereā€™s a suspicion that its related to module loading in sandboxes in some way (which is something jest does a lot of)
  • The workerIdleMemoryLimit workaround was added to Jest 29 which allows capping the maximum memory in use by a worker, at which point itā€™ll be killed and replaced with a fresh worker
  • The worker-limiting workaround only works if youā€™re running workers, so doesnā€™t work with runInBand without one of the patches listed above
  • The workerIdleMemoryLimit only really needs to be set high enough to run a few test suites - lower seems to be better as that causes the workers to be recycled more often rather than building up loads of memory usage
  • Even with the workaround, test times are observed to be noticably slower than before, likely due to the memory usage taking upn resources, and the extra time spent recycling workers

Iā€™m curious as to what the fix is, Iā€™ve looked at the release and cannot find the commit that addresses this. It should be way more notable in the release notes that this was fixed

The commits that fixed it are in the vm: fix V8 compilation cache support for vm.Script part in that blog post, where Jest was also mentioned, though it may not be immediately obvious why. I think the issues described from the Jest side are multi-faceted and as https://github.com/jestjs/jest/issues/11956#issuecomment-1757678185 mentioned this only mitigated the regressions for pure CJS compilation (i.e. regression comes back when thereā€™s actual import() or --experimental-vm-modules).

To summarize, the regression was introduced when Node.js implemented the importModuleDynamically option back in v16, to retrieve this option when import() is called within the script compiled, Node.js needs to set a host-defined option for this script. Previously, when V8 re-compiled a script that it has seen before, it could do a quick lookup on an internal table and return the script thatā€™s already compiled, making subsequent compilation almost free. The introduction of host-defined options resulted in cache misses, because in the implementation of Node.js up until not long ago, this option was different for every script compiled. The cache miss not only resulted in a performance regression (then V8 had to compile the previously-seen scripts from scratch every time), but could also contribute to higher memory usage (every script subsequently compiled is cached again with a slightly different option even though the source code remains the same). There was also another source of memory leak coming from how the scripts are managed within Node.js. To fix the issues, we worked around the problem in Node.js by:

  1. Rework the memory management of the scripts and host-defined options in https://github.com/nodejs/node/pull/48510 which needed additional V8 API support that we upstreamed (see the PR for a detailed write-up of what was happening)
  2. Use a constant host-defined option as the default when importModuleDynamically isnā€™t configured https://github.com/nodejs/node/pull/49950 - and because itā€™s a constant for all scripts compiled without this option, the cache can be hit again in this case. However, this would still require some changes from Jestā€™s side, so we did 3.
  3. For Jestā€™s use case, where importModuleDynamically is always configured but will throw a warning when --experimental-vm-modules isnā€™t set (because then you canā€™t actually do anything useful with that option anyway), we use another constant option to throw a built-in warning that serve the same purpose in https://github.com/nodejs/node/pull/50137, which finally leads to automatic fix of this issue in v21.1.0 without further changes from Jestā€™s side.

So the issues are only mitigated, but not yet fully fixed - it comes back when --experimental-vm-modules is used. See https://github.com/nodejs/node/issues/35375#issuecomment-1773705384 about remaining issues & ideas about how they can be fixed/worked around, so itā€™s still work in progress.

Our company created a custom runtime based on the patch above that is located here: https://github.com/reside-eng/jest-runtime

This runtime will only work with Jest v28 and it will not work if you are using ESM modules. There is some additional information in the README.md about using swc if you are also using Typescript, but this is not required to use the custom runtime.

$ yarn install -D @side/jest-runtime

then add the following to your jest.config.js

module.exports = {
  ....
  runtime: '@side/jest-runtime',
};

For those who are deeply embedded in using Jest and donā€™t have the bandwidth to move to another suite (e.g. vitest), the NodeJS team has a potential fix in a draft PR per this comment.

From what i understand the underlying bug fix in v8 just landed yesterday https://chromium-review.googlesource.com/c/v8/v8/+/4834471

Now porting of it in node is pending https://github.com/nodejs/node/pull/48510

With those results (thanks for verifying!), I think we can close this. Iā€™ll update the OP as well.

šŸ˜Œ

Node 20.10.0 has been released including the fixes mentioned in this PR šŸŽ‰

https://github.com/nodejs/node/releases/tag/v20.10.0

21.1.0 is out with all the fixes: https://nodejs.org/en/blog/release/v21.1.0

Please give it a try šŸ™‚

I did some testing of the various potential fixes, and here are my results, comparing heap usage vs. execution time.

Note: I forced jest --maxWorkers=2 for all tests, since the jest option --workerIdleMemoryLimit needs workers > 1

Note: regarding Heap Usage: For the tests with workerIdleMemoryLimit set, I was unable to determine max heap usage accurately, so I just noted what jest reported before a respawn. But it might be more accurate to peg it at 1700x2 = 3400MB as ā€œmaxā€ usage.

Test Machine Specs

Testing was done on my local laptop. There are 350+ test suites with a total of 3000+ tests. The tests are written in typescript (as is the source), and I utilize ts-jest. This may have had an effect on my results vs. non ts-jest scenarios.

$ neofetch
                   -`                    raghu@arch
                  .o+`                   ----------
                 `ooo/                   OS: Arch Linux x86_64
                `+oooo:                  Host: 20Q0S05E00 ThinkPad X390
               `+oooooo:                 Kernel: 6.0.8-arch1-1
               -+oooooo+:                Uptime: 1 day, 19 mins
             `/:-:++oooo+:               Packages: 979 (pacman)
            `/++++/+++++++:              Shell: bash 5.1.16
           `/++++++++++++++:             Resolution: 2560x1440
          `/+++ooooooooooooo/`           WM: sway
         ./ooosssso++osssssso+`          Theme: Adwaita [GTK2/3]
        .oossssso-````/ossssss+`         Icons: Adwaita [GTK2/3]
       -osssssso.      :ssssssso.        Terminal: alacritty
      :osssssss/        osssso+++.       Terminal Font: TerminessTTF Nerd Font Mono
     /ossssssss/        +ssssooo/-       CPU: Intel i7-8565U (8) @ 4.600GHz
   `/ossssso+/:-        -:/+osssso+-     GPU: Intel WhiskeyLake-U GT2 [UHD Graphics 620]
  `+sso+:-`                 `.-/+oso:    Memory: 8982MiB / 15687MiB
 `++:.                           `-/+/
 .`                                 `/

Parameters Involved

There were four prarameters involved:

  • Node: V16.10 vs v18.12
  • Node options: --no-compilation-cache --expose-gc vs [none] (denoted by -ncc -egc in the graphs)
  • Jest Runtime: default vs @side/jest-runtime (thanks @a88zach ! Check it out https://github.com/reside-eng/jest-runtime)
  • WorkerIdleMemoryLimit: none vs ā€˜1700MBā€™ (thanks @phawxby !) (among different values, 1700MB seemed to work best)

Total 4 parameters, 2^4 = 16 data points.

Comparing memory usage in v16.10 vs v18.12

v16 10 and v18 12

The ā€œleakā€ in >= v16.11 is quite apparent when no node options / special jest config is used. The v18 test uses almost 3x as much! However, using the jest param workerIdleMemoryLimit (Jest v29+) can cap this quite well.

If you want to purely optimize for memory usage, this param alone should be enough to make tests runnable in your environment. Interestingly,

Comparing execution time in v16.10 vs v18.12

v16 10 and v18 12(1)

For v18.12, execution time seems to be reduced by either using @side/jest-runtime as well as workerIdleMemoryLimit. using both reducded it even more.

The winner for execution time is v16.10 with workerIdleMemoryLimit , but v18.12 can come pretty close.

The --no-compilation-cache and --expose-gc node options increase execution time massively for both v16.10 and v18.12.

Comparing memory usage vs execution time

16v18_heap_time

Finally, we use a bubble chart to try and find the ā€œbest compromiseā€. The same color indicates same node options / jest config, with one being v16.10 and the other v18.12

Iā€™ll allow you guys to draw your own conclusions from this.

Conclusion

If you previously did ā€œnormalā€ jest testing without any node options / tweaking, then memory usage in v16.11+ was a massive problem. But it seems the workerIdleMemoryLimit param does indeed help in this case, and execution time can be similar-ish.

Iā€™d advise you guys to try it out, and tweak it every 100MB to find what works best (results do vary quite a bit, and will depend on your test suite).

Raw Results

Node Version Heap Total Execution Time Node Options/Jest Config
v16.10 3249 194.828 none default [ ]
v16.10 3079 138.958 none default [1700MB]
v16.10 4821 204.474 none @reside-eng []
v16.10 3231 178.133 none @reside-eng [1700MB]
v16.10 8985 542.49 -ncc -egc default []
v16.10 3004 291.597 -ncc -egc default [1700MB]
v16.10 4698 511.942 -ncc -egc @reside-eng []
v16.10 2522 363.435 -ncc -egc @reside-eng [1700MB]
v18.12 9413 211.597 none default [ ]
v18.12 2816 193.41 none default [1700MB]
v18.12 5136 187.871 none @reside-eng []
v18.12 3204 164.835 none @reside-eng [1700MB]
v18.12 9163 455.617 -ncc -egc default []
v18.12 3066 263.803 -ncc -egc default [1700MB]
v18.12 4889 451.724 -ncc -egc @reside-eng []
v18.12 3231 292.232 -ncc -egc @reside-eng [1700MB]

What worked for me was combining the 2 node flags: --no-compilation-cache and --expose-gc. Using this example above Iā€™ve managed to run node --no-compilation-cache --expose-gc ./node_modules/jest/bin/jest.js --runInBand --logHeapUsage and got heap sizes between 22MB and 25MB. Also using these flags the max heap size in my codebase decreased from ~2000 MB to ~400MB . Tested on node v16.11.0. Hope it helps !

If anyone is looking to use the new workerIdleMemoryLimit setting and still run test serial, I was able to hack the jest-runner as seen below:

  1. Create a new jest runner
// integration-test-runner.js
const TestRunner = require('jest-runner').default;

class IntegrationTestRunner extends TestRunner {
  constructor(_globalConfig, _context) {
    super({ ..._globalConfig, maxWorkers: 1 }, _context);

    this.isSerial = false;
  }
}

module.exports = IntegrationTestRunner;
  1. Update jest config
// jest.config.js
module.exports = {
  ...
  ...(process.env.CI === 'true' && {
    maxWorkers: 2, // important for running in Github where the runners only have 2 CPUs
  }),
  logHeapUsage: true,
  workerIdleMemoryLimit: '1GB',
  runner: './integration-test-runner.js',
  ...
};

Now, whenever the heap grows over 1GB, it will kill the worker and spawn a new one

Screen Shot 2023-01-18 at 11 36 58 AM

Same here. On node 16.13.1, heap size goes up to 500MB while on node 16.10, it stays at 50MB with the following repro:

mkdir test
cd test
npm install jest
mkdir src
echo 'it("a", () => expect(0).toBeFalsy())' > src/a.test.js
for i in {1..100}; do cp src/a.test.js "src/a$i.test.js"; done
./node_modules/jest/bin/jest.js --runInBand --logHeapUsage

Edit: install Volta and do volta run --node 20.1.0 -- node ./node_modules/jest/bin/jest.js --runInBand --logHeapUsag to test a specific node version

@alexw10 I would strongly recommend everyone still on jest to migrate to vitest these days. It is faster, more stable, API is for the most part a drop-in replacement. And facebook gave up on jest.

Can anyone confirm if this problem persists using Node 18.15.0?

I can confirm that problem persists on Node 18.15.0 2023-03-22_10h19_29

Following a suggestion on the Node issue, I tried running with the Node option --no-compilation-cache. Suddenly my CI with 5000 tests works again, and furthermore it finishes without the ~1GB of memory leakage that I always saw under Node 14 (and of course without the 5GB+ that would leak under Node 16.13 until OOM). The downside is that it seems to take about 25ā€“50% longer to finish.

For those visiting from search engines, I have some bad news. This is not fixed as of 1/6/2023. There are two potential fixes:

  • Stay on Node 14 (untenable for most)
  • Upgrade Jest to 29 so you can use the workerIdleMemoryLimit configuration option
  • Use nodeā€™s --no-compilation-cache flag (your tests will finish, but will run far slower)

The underlying leak seems to have been introduced in V8 (bug report). Jest introduced workerIdleMemoryLimit to help alleviate this problem until a better solution can be found. It utilizes the brute-force approach of tearing down test workers whose idle memory is over a certain threshold (e.g., if they have memory that isnā€™t being freed).

Replacing Jest isnā€™t really an option for us, so in order to Upgrade from Node 14 we had to upgrade Jest, as well. Quite a fun week.

This one on the other hand might solve it šŸ˜ƒ

https://github.com/nodejs/node/pull/49950

Can confirm: 21.1.0 means we can finally run our massive jest -i database test suite again (previously we had to split it into 4 separate runs to avoid running out of RAM). Total execution time is down from 2m12 for the four split test suites to 1m33 for running the entire lot. Huge thanks to everyone involved, and weā€™d love to see this backported to Node 20 (and ideally Node 18) so we donā€™t have to split the runs in CI any more. Thanks again ā¤ļø

@kibertoad This issue stems from the Node and the vm module, not some unfixed issue in Jest. Itā€™s true that (afaik) Facebook is no longer contributing to Jest development, but there is still an active community and a maintainer who very quickly made a workaround for the issue in this thread ā€“ lets not belittle their efforts.

This last week Iā€™ve been doing a whole bunch of testing against our repo to see if I can figure out the cause of this issue. Ultimately Iā€™ve not had any luck. Iā€™ve tested Node directly with vm.Script and I cannot get it to leak. Iā€™ve also tried jest directly with 1000ā€™s of tests and I cannot get that to leak either in a small isolated test case. I can only get the leaking to reliably occur when used with our application, however I donā€™t believe itā€™s our application actually at fault as others are experiencing the same issue.

There are 2 main issues Iā€™ve uncovered:

  • Stalling of tests preventing jest from exiting. This was actually a different problem, the tests hadnā€™t stalled, the workers were crashing due to running out of memory. Jest was not detecting this crash so sat there waiting. The PR for this was merged this morning. #13054
  • Memory leaking. I cannot figure out why this is happening, but I do believe we can work around the problem. When debugging I found that the memory usage of the parent thread is stable with no obvious issues, the memory use of the workers is where the problem lies. After a test executes the memory usage of that worker should return down to some sensible level but in many cases isnā€™t. The PR iā€™m currently working on introduces an optional workerIdleMemoryLimit config option, if set after every test file execution the memory usage of the worker is checked. If it exceeds the specified limit the worker is restarted.

Here is the memory usage of 1 worker.

PROCESS MEMORY USAGE: 174.91MB
PROCESS MEMORY USAGE: 203.34MB
PROCESS MEMORY USAGE: 314.45MB
PROCESS MEMORY USAGE: 366.71MB
PROCESS MEMORY USAGE: 396.36MB
PROCESS MEMORY USAGE: 471.20MB
PROCESS MEMORY USAGE: 524.45MB
PROCESS MEMORY USAGE: 588.69MB
PROCESS MEMORY USAGE: 633.61MB
PROCESS MEMORY USAGE: 677.36MB
PROCESS MEMORY USAGE: 727.76MB
PROCESS MEMORY USAGE: 775.04MB
PROCESS MEMORY USAGE: 825.63MB
PROCESS MEMORY USAGE: 548.84MB
PROCESS MEMORY USAGE: 604.90MB
PROCESS MEMORY USAGE: 656.86MB
PROCESS MEMORY USAGE: 767.94MB
PROCESS MEMORY USAGE: 845.00MB
PROCESS MEMORY USAGE: 893.51MB

Experiencing the same with latest node LTS (18.16.0) and jest 29.5.0

Iā€™ve run some tests considering various configurations. Hope it helps someone.

node version node args jest args custom behavior time (seconds) heap (mb)
16.10 ā€“expose-gc --no-compilation-cache ā€“maxWorkers 1 afterAll(global.gc) + force options.serial to false on jest-runner 303 45
16.18 ā€“expose-gc --no-compilation-cache ā€“maxWorkers 1 afterAll(global.gc) + force options.serial to false on jest-runner 325 47
16.10 ā€“expose-gc --no-compilation-cache ā€“maxWorkers 2 - 236 64
16.18 ā€“expose-gc --no-compilation-cache ā€“maxWorkers 2 - 167 67
16.10 ā€“expose-gc ā€“maxWorkers 1 afterAll(global.gc) + force options.serial to false on jest-runner 234 82
16.10 ā€“expose-gc ā€“maxWorkers 2 - 155 96
16.10 ā€“expose-gc --no-compilation-cache ā€“runInBand --detectLeaks afterAll(global.gc) 313 159
16.10 ā€“expose-gc --no-compilation-cache ā€“runInBand --detectLeaks - 307 160
16.10 ā€“expose-gc --no-compilation-cache ā€“runInBand - 313 160
16.10 ā€“expose-gc --no-compilation-cache ā€“maxWorkers 1 - 333 160
16.10 ā€“expose-gc --no-compilation-cache ā€“runInBand --detectLeaks afterEach(global.gc) 397 160
16.18 ā€“expose-gc --no-compilation-cache ā€“runInBand --detectLeaks afterAll(global.gc) 281 164
16.18 ā€“expose-gc --no-compilation-cache ā€“runInBand --detectLeaks afterEach(global.gc) 298 164
16.18 ā€“expose-gc --no-compilation-cache ā€“maxWorkers 1 - 287 165
16.18 ā€“expose-gc --no-compilation-cache ā€“runInBand --detectLeaks - 300 165
16.18 ā€“expose-gc --no-compilation-cache ā€“runInBand - 337 165
16.10 ā€“expose-gc ā€“runInBand --detectLeaks - 258 199
16.10 ā€“expose-gc ā€“runInBand - 247 201
16.10 ā€“expose-gc ā€“maxWorkers 2 - 286 201
16.10 ā€“expose-gc ā€“runInBand --detectLeaks afterAll(global.gc) 256 202
16.10 ā€“expose-gc ā€“runInBand --detectLeaks afterEach(global.gc) 309 206
16.10 ā€“runInBand - 261 629
16.18 ā€“expose-gc ā€“maxWorkers 2 - 277 899
16.18 ā€“no-compilation-cache ā€“runInBand - 297 907
16.18 ā€“runInBand - 281 1055
16.18 ā€“expose-gc ā€“runInBand - 347 1262
16.18 ā€“expose-gc ā€“maxWorkers 1 afterAll(global.gc) + force options.serial to false on jest-runner 337 1380
Test Suites: 3 skipped, 31 passed, 31 of 34 total
Tests:       20 skipped, 49 todo, 171 passed, 240 total
Snapshots:   0 total

* Running with Jest 29.2.2 on a bitbucket pipeline container using node official docker images

@PAkerstrand some notes:

  • The performance regression is likely contained to vm.Script and so running Node 16.11+ in production should not experience the performance regression.
  • The memory leak appears to be in the compilation cache.
  • Unless Jest is using vm.Script incorrectly, then the fix is unlikely to be in Jest space.
  • Speculation/hypothesis: The performance regression is likely in Node or v8.

What to do? Some options:

  1. Upgrading and staying at Node 16.10 is not a viable option, since we wonā€™t be in a position to apply CVE patches.
  2. Attempt to debug Node and/or v8. Possible, but hard.
  3. Replace Jest with a testing framework that doesnā€™t use vm.Script.

So it looks like the patch can at least unblock Node upgrades, but the tradeoff is that the test suite is significantly slower to run.

Node Version Comments Run Time Percentage Change Notes
16.10.0 ā€œBaselineā€ 7m41s N/A
16.13.2 No patches, just Node upgrade 12m33s 63.3% increase Many tests timed out, so the suite failed
16.13.2 Using @Celluleā€™s patch 10m42s 39.2% increase Test suite passed, just slower
16.13.2 Using patch + --max-old-space-size=700 10m56s 42.3% increase
16.13.2 Using patch + --max-old-space-size=2048 10m31s 36.9% increase
16.13.2 Using patch + --no-compilation-cache 9m33s 24.3% increase

Interestingly, in the fastest option (--no-compilation-cache), I also see the unbounded growth of the heap go away. So it does seem like https://github.com/nodejs/node/issues/40014 and the corresponding (closed/wontfix) V8 bug seem particularly suspect: https://bugs.chromium.org/p/v8/issues/detail?id=12198

I tested 19.8.0 with the memory leak fix in and it did not improve.

Iā€™ve been running performance tests on jest tests and most of the time taken with jest is in the fact that at the start of every spec, files are re-imported again. So this means anything in your environment (jsdom), any serializers, shims, jest setup files and everything that your spec file imports.

So for instance I doubled the speed of our tests by:

  1. moving setup that is only required in a % of tests into helpers inported by those tests rather than global setup
  2. we are migrating away from enzyme, so for the 600 enzyme spec files left, those now import a helper that imports enzyme, the adapter and the serializer, so they donā€™t need to be imported on the other 2200
  3. we already did this - but seperate tests between ones needing jsdom and ones that donā€™t
  4. we were using lodash-es which imports each file seperately - moving it to lodash meant we imported a single file, which improved overall running time by 2 minutes!

This bug specifically affects large number of files being repeatedly imported every spec - so the worse your setup is w.r.t. the above, the worse the bug affects you, where as if each test is very isolated, importing a minimal number of files, then probably the underlying v8 bug makes no difference.

Hope this helps!

@EternallLight would you be willing to edit your original issue and put at the top a short workaround explanation? Thereā€™s an awful lot of people dealing with this problem unaware workarounds already exist. Something along the lines of:

Workaround

Using --workerIdleMemoryLimit is successfully able to bring heap usage under control however you must use at least 2 workers and not run in band.

Weā€™re also experiencing this issue. Node 16.11+ and Jest v27 consumes significantly more memory. Node 16.10 and Jest v27 seems OK.

Iā€™ve been monitoring memory consumption for quite a while where I work and this is GREAT šŸš€: ~300MB drop comparing 10 runs average

memory_comparison_node_21

6m8s ā€”> 4m21s run time šŸš€šŸš€šŸš€

ESM is separate from this issue.

Node 16 is EOL, but this might get backported to v18 and v20

How can we get to know whether it will be backported to 20 or 18 or not?

By following upstream Node. Iā€™m sure somebody will post here if (hopefully when) a backporting PR is opened.

Yeah, PR sent for the above (https://github.com/nodejs/node/pull/50137), meaning no changes to Jest will be necessary. If and when that lands, we can close this issue in Jest šŸŽ‰

Note that if native ESM (including dynamic imports) is used, there will still be code cache issues. But the ā€œnormalā€ case (CJS) should be fixed allowing people stuck on older versions to upgrade šŸ™‚

I wonā€™t be backporting, but it sounds like Node might be able to handle it: https://github.com/nodejs/node/pull/49950#issuecomment-1741032282

If they end up not doing so, patching via yarn patches (or patch-package) should prove trivial

Wanted to give everyone in this information on what we learned working through this issue in our environment.

We utilize azure devops we have agents on this that we are using that have 32GB memory and we moved to 64GB memory machines. This essentially solved the issue for us there are multiple pipelines that can be run simultaneously and all run. We currently have 4000 tests that are now running pretty fast. 3 simultaneous builds on average the unit testing portion now runs in ~5 minutes.

We have the maxWorkers set to 4 We have the workerIdleMemoryLimit set to 2.5G

With these settings we have seen the best results speed wise as well as being able to run up to 20 simultaneous jobs at once.

Just wanted to give out some insight as to what has worked for us. Really hope Jest/Node is able to solve this problem some day so these hard coded values are no longer needed and appropriate things are just automatically figured out as well as no memory leaks are happening.

Hey there,

weā€™ve got a react-native project (heavily relying on mocks) that has gotten unbearably slow and started failing because of running out of memory. While Iā€™m sure our tests, mocks, snapshots, etc. have a lot to be improved on, the following tables comparing the same project across node version, and the results are ā€œinterestingā€.

node --expose-gc ./node_modules/jest/bin/jest.js \
  --config jest.config.js \
  --no-watchman \
  --runInBand \
  --logHeapUsage

using jest@26.6.2, ts-jest@27.1.5

Node Suite 1 Suite 5 Suite 10 Suite 15 Suite 20
Node 18 411 MB 809 MB 1119 MB 1368 MB 1745 MB
Node 16 351 MB 578 MB 798 MB 939 MB 1189 MB
Node 14 314 MB 293 MB 355 MB 308 MB 345 MB

using jest@29.0.0-alpha.6, ts-jest@29.0.3

Node Suite 1 Suite 5 Suite 10 Suite 15 Suite 20
Node 18 440 MB 1174 MB 1426 MB 1811 MB 2623 MB
Node 16 384 MB 852 MB 997 MB 1255 MB 1841 MB
Node 14 344 MB 393 MB 351 MB 388 MB 558 MB

Node 18 with pointer compression did not make a positive difference. Adding --workerIdleMemoryLimit='800MB' didnā€™t make a difference at all.

What did make a massive difference was running node with --no-compilation-cache, no leaks, similar memory consumption for all node versions, basically confirming https://github.com/facebook/jest/issues/11956#issuecomment-991796821 and https://github.com/facebook/jest/issues/11956#issuecomment-994914988 a year later still is the case.

node --no-compilation-cache --expose-gc ./node_modules/jest/bin/jest.js \
  --config jest.config.js \
  --no-watchman \
  --runInBand \
  --logHeapUsage

using jest@29.0.0-alpha.6, ts-jest@29.0.3

Node Suite 1 Suite 5 Suite 10 Suite 15 Suite 20
Node 18 425 MB 352 MB 416 MB 338 MB 408 MB
Node 16 358 MB 298 MB 355 MB 286 MB 340 MB
Node 14 360 MB 291 MB 346 MB 278 MB 331 MB

Hereā€™s the result of my analysis on my repo (all numbers are in seconds). Note that this subset runs ~1300 tests across ~130 files. Each files imports ~700 other files (our test harness is heavy) From this, it seems that compileFunction does bring performance of 16.11 similar to 16.10 BUT 16.10 with new Script was the fastest configuration so it still results in a heavy performance reduction. At this point I almost wonder if jest should add a configuration to let people choose which version (new Script or compileFunction) to use.

Right now we decided to pin to node 16.10 in our test suite and will keep using the new Script approach and I just hope it gets resolved at node.js/v8 level before we have to update

Run Node 14.18 new Script Node 16.10 new Script Node 16.11 new Script Node 14.18 compileFunction Node 16.10 compileFunction Node 16.11 compileFunction
1 392s 426s 569s 555s 609s 522s
2 433s 390s 611s 504s 449s 536s
3 480s 398s 729s 445s 466s 597s
4 447s 373s 668s 444s 519s 463s
Average 438s 397s 644s 487s 511s 530s
Configuration AVG
Node 14.18 new Script 438 0% -10% 32% 10% 14% 17%
Node 16.10 new Script 397 9% 0% 38% 19% 22% 25%
Node 16.11 new Script 644 -47% -62% 0% -32% -26% -22%
Node 14.18 compileFunction 487 -11% -23% 24% 0% 5% 8%
Node 16.10 compileFunction 511 -17% -29% 21% -5% 0% 4%
Node 16.11 compileFunction 530 -21% -33% 18% -9% -4% 0%

After some digging, the suspicion surrounding vm.Script seems to be reinforced. This might help: https://github.com/facebook/jest/pull/12205

We are close to victory folks šŸ˜† , 1 approval at a time!

https://github.com/docker-library/official-images/pull/15761

Rerunning my minimal repro with node 20.1.0 still gives ~300MB heap size vs ~80MB in node 16.10.

Node 18 ( node 16 on CI with same behavior ). Jest 29. Was getting like 1.5+gb mem usage per worker. After switching to coverage: ā€˜v8ā€™ it start to OOM whether it runs with workers or runInBand. The only fix i found which decreased mem usage both with or without v8 coverage signifinactly from initial 1.5gb to around 500-800mb was 3 flags --no-compilation-cache --expose-gc --logHeapUsage Removing either one of it - bring back OOM. Doesnt make much sense, for example why logHeapUsage would decrease memory usage and prevent OOM and also actually increase speed? Profiling/logging should make it worse, not betterā€¦ Something fishy happens here.

Ours was due to superfluous transpilations

  • We also came across this issue when upgrading to latest Node 16.
  • Although it might be a specific case, we found out that most of the heap was used for transpiled modules, which made the tests timeout.
  • This was because we have a mono-repo, and did not exclude transpilation for the root node_modules, so jest was transpiling all the modules in there, including, for example, MUI 5.
  • Ensuring we only transpile what is needed, solved it for us.

Takeaway is, ensure these issues are not caused by transpilation resources.

How did we derived at that?

  • We did some heap profiling following https://jestjs.io/docs/troubleshooting#tests-are-failing-and-you-dont-know-why
  • When we look at the report, nearly all heap allocation was dedicated to strings that look like modules (same as webpack eval option for dev builds).
  • This raised the suspicion we may be transpiling much more than we should.
  • By adding <rootDir>/../node_modules/ to transformIgnorePatterns, we got the tests behaving again.

Future work

  • Iā€™m not aware of such option out-of-the-box, but would be nice if some ci flag would allow outputing all files being transpiled, and how long transpilation takes. This currently can be done using a custom transformer that delegates to babel-jest.

I expanded the tests done here: https://github.com/facebook/jest/issues/11956#issuecomment-967274859

I tried also --no-all-sparkplug, verified the flag was passed to node correctly and node supported it. I tried switching sparkplug on in 16.10.0 as well - and I agree with the findings, no sparkplug correlation.

I then reproducing on v17 nightlies: image

Which means this commit is bad:

https://github.com/nodejs/node/commits/v17.x?after=36ad9e8444a0f0a481a70a9f386e35f4037e3cec+1238&branch=v17.x

image

image

And it is pretty much all to do with upgrading v8 to v9.4

For those interested, Iā€™m inviting you to join us in a live discussion about this problem. Weā€™ll be online over Zoom for two hours on Friday at 8am ET to discuss how we might troubleshoot/fix this regression. Details below.

Calendar invite

https://calendar.google.com/event?action=TEMPLATE&tmeid=MWw1YWpoZGRpZXBsaGF2aWs0MGhndnA3MzkgcnNoYXJtYUBicml6YS5jb20&tmsrc=rsharma%40briza.com

Zoom link

Topic: Node.js 16.11.0+ / Jest - Memory consumption Time: Dec 17, 2021 08:00 AM Eastern Time (US and Canada)

Join Zoom Meeting https://briza.zoom.us/j/86836754356?pwd=SVI2dUxIVUlScGlRdEpFYXJtU3NCZz09

Meeting ID: 868 3675 4356 Passcode: 580759 One tap mobile +13126266799,86836754356#,*580759# US (Chicago) +13462487799,86836754356#,*580759# US (Houston)

Dial by your location +1 312 626 6799 US (Chicago) +1 346 248 7799 US (Houston) +1 646 558 8656 US (New York) +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) +1 301 715 8592 US (Washington DC) Meeting ID: 868 3675 4356 Passcode: 580759 Find your local number: https://briza.zoom.us/u/kdRQiLHyl

@glenjamin

The worker-limiting workaround only works if youā€™re running workers, so doesnā€™t work with runInBand without one of the patches listed above

This has been fixed with in the latest version? https://github.com/facebook/jest/releases/tag/v29.4.3

Iā€™ve ā€˜solvedā€™ this issue with the following workaround (works on any version of node and jest as long as more than 1 worker is used)

TestEnvNode.js

// you can also use jest-environment-jsdom if needed 
const NodeEnvironment = require('jest-environment-node');
const MAX_HEAP = 1000 ** 3;
class TestEnvNode extends NodeEnvironment {

  constructor(config, context) {
    super(config, context);
    // when running in main thread process.send is not defined
    if (!process.send) {
      return;
    }
    const originalProcessSend = process.send.bind(process);

    process.send = (...args) => {
      originalProcessSend(...args);

      // if heap gets larger that 1gb , kill the worker. Jest will simply create a new one.
      // As long as a SINGLE test file do not use more than a 1gb this should be fine, in case increase this value
      if (process.memoryUsage().heapUsed > MAX_HEAP) {
        process.exit(1);
      }
    };
  }
}
module.exports = TestEnvNode;

jest.config.js

{
  // ...
  testEnvironment: '<path_to_TestEnvNode.js>'
}

by killing the worker in a ā€˜controlled wayā€™ this trick avoid fatal crash of the whole jest suite and Heap memory of each worker never goes over 1gb. in my use case (840 test suites, ~6k tests) this solution is also 30% faster than running in node 16.10.

unfortunately this solution does not work if runInBand is enabled or when running on a single worker (-w1) as jest will not spawn workers.

Iā€™ve got the Alpha 5 release installed locally on our problematic project and with workerIdleMemoryLimit: '200MB' it is working perfectly. I think that memory limit will need tuning based on your own circumstances so that it doesnā€™t recycle too frequently but the point is that itā€™s working and not crashing out due to filling the heap.

I would appreciate if others could do some validation and make sure itā€™s working for them before this gets released as stable.

I was trying to isolate the issue from our codebase and created this repo. It looks like that even for very simple test suites, the problem is there.

For those wanting to see the actual upstream issue fixed from what I can tell

https://github.com/nodejs/node/issues/44211 https://github.com/nodejs/node/issues/42080

cover the issue in node.

and afict a proper fix is waiting on this change in v8 https://chromium-review.googlesource.com/c/v8/v8/+/3172764

It would be interesting to work out if the jest performance issue is solved in 19.8.0 when a attempted fix landed, before it was then reverted.

Yep, same with v18.15.0

Hello !

I had a huge memory consumption issue on my project after updating to node 16.17.1 and jest 26.6.3. Finally I gave a try at jumping to 29.0.0-alpha.5 and it resolved my issue !!

Thank you @phawxby and @SimenB for the workerIdleMemoryLimit parameter ā¤ļø

@nover I think youā€™re approaching this from the wrong side. runInBand is supposed to run tests serially in the current process. Thereā€™s advantages to doing that, faster test results and normally lower memory footprint, plus it makes it much easier to debug what is going on. I think the issue youā€™re having stems from the fact thereā€™s currently no way to run tests serially with a worker, and that is partially down to the fact internally workers: 1 is viewed as runInBand.

My suggestion is that the internal workings of that should change and maxWorkers should refer to actual number of workers spawned. 0 = run in band, no workers. >= 1 spawn workers. But even then that wouldnā€™t fully solve the problem because jest will try to run in band when possible based on previous test performance for example.

Both of the PRā€™s iā€™ve opened that could address your issue are definitely patch fixes and it we were going to do something more long term it would need some proper design and thought.

@nover unfortunately thereā€™s nothing in the coming release to help you, without workers thereā€™s nothing for it to restart. A possible solution might be to change the current shouldRunInBand rule so that it ignores the current number of workers. That way it makes it possible to run tests linearly (-w=1) but without requiring them to run in the same process so that it spawns a worker. If you wanted to do that it would be here but would probably need feedback from @SimenB. If we were going to do it now would be the time though with a major release coming.

@stephenh --expose-gc is independent of --logHeapUsage, you can do one without the other. The docs suggest itā€™s possible to use in combination to better diagnose problems. The function used is process.memoryUsage(). Now Iā€™m reading the docs you could be right, it might be helpful to point out itā€™s not required.

I had this issue with our test suite after our upgrade from node v14 to v16.14+. We had coverageProvider set to v8. After removal (using babel as described as default in the docs), memory issue was fixed.

@pwadmore-ea this was asked over here in the Node.js repo:

Seems like itā€™s being backported here:

FWIW, that PR will currently need this diff in Jest to work

diff --git i/packages/jest-runtime/src/index.ts w/packages/jest-runtime/src/index.ts
index 5ff85684f0..dca7552926 100644
--- i/packages/jest-runtime/src/index.ts
+++ w/packages/jest-runtime/src/index.ts
@@ -1692,7 +1692,7 @@ export default class Runtime {
         displayErrors: true,
         filename: scriptFilename,
         // @ts-expect-error: Experimental ESM API
-        importModuleDynamically: async (specifier: string) => {
+        importModuleDynamically: runtimeSupportsVmModules ? async (specifier: string) => {
           invariant(
             runtimeSupportsVmModules,
             'You need to run with a version of node that supports ES Modules in the VM API. See https://jestjs.io/docs/ecmascript-modules',
@@ -1709,7 +1709,7 @@ export default class Runtime {
           );
 
           return this.linkAndEvaluateModule(module);
-        },
+        } : undefined,
       });
     } catch (e: any) {
       throw handlePotentialSyntaxError(e);

Hopefully that can be dealt with in Node itself so the performance improvement is not reliant on a change in Jest (which would necessitate people upgrading their Jest version)

Like mentioned in https://github.com/jestjs/jest/issues/11956#issuecomment-1719764041, those fixes only help with segfaults when using ESM and import(), and not the runaway memory consumption caused by missing code cache. So while very welcome fixes, it doesnā€™t help with the bug in this issue, unfortunately.

Iā€™d be super happy to be proven wrong here, of course šŸ™‚

See https://github.com/nodejs/node/issues/35375#issuecomment-1003411096

@sunilsurana ā€œJust use vitestā€ is one of the best answers in 2023. Itā€™s API-compatible with Jest, so migration is trivial.

I donā€™t know if could maybe help someone here, but I had the same problem and recently solved it.

I had a test:unit script in my projectā€™s package.json that run: jest --detectOpenHandles --logHeapUsage --colors -c jest.config.js.

My 195 tests consumed a total of about 1800MB heap size (starting from 200MB) and took more than 30s to complete.

The solution in my case was just removing the --detectOpenHandles option and I can now run my tests within 10s with max 580MB by test (no more incremental consumption).

node: 18.12.1
npm: 8.19.2
jest: 28.0.0

You should only run with detectOpenHandles when youā€™re trying to troubleshoot an issue with your tests as itā€™s a diagnostic flag, otherwise it will make all tests run serially (same as runInBand) and will introduce severe slowdown/memory increase as you mentioned. See docs.

Even with using workerIdleMemoryLimit our tests are taking long times. We are not able to update node beyond of 16.10.0 . This has become a big blocker for us. Is there any fix for this?

@sunilsurana

This is a node issue so nothing that Jest can do. The contributors for node did try a fix but had to revert the change due to an issue.

Have you tried running your jest with the following flags: node --expose-gc --no-compilation-cache ./node_modules/jest/bin/jest.js <any other jest flags that you use here>

We found this reduced our maximum heap size from 3.6GB to 0.8GB

@blimmer Using v8 as the coverage provider from the default babel is not a 1:1 switch (more information per this issue). As it may resolve the OOM issue for some or most, it may also throw off coverage thresholds in CI pipelines.

We have also recently hit this with v18.15.0 @didaquis so can confirm it still persists it seemsā€¦

What helped to resolve this issue temporarily in my case:

  1. set Jest config option:
"maxWorkers": 1
  1. run jest as follows:
node \
  --max-old-space-size=2048 \
  --no-compilation-cache \
  --expose-gc \
  ./node_modules/jest/bin/jest.js --runInBand

NOTE: I am using 2048 above because only this amount of memory lets some of my tests pass without getting OOM. Found this value experimentally. You might need to tweak this value for your particular use case.

NOTE: I am using the following versions:

$ node --version
v18.14.2

$ npm --version
9.5.0

$ node ./node_modules/jest/bin/jest.js --version
27.5.1

$ uname -a
Linux vb-1 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

My suspicion is that constricted CI environments may not handle Jest parallel mode well even if limited to a single worker. In my case, jobs were getting stuck and had to be manually stopped.

It may be not directly related to this issue, but, aside from the other options discussed here, I am currently also sharding executions to reduce some of the load and accelerate test execution. With fewer tests to run per CI container, there is less memory build-up in each one.

yarn node --expose-gc --no-compilation-cache ./node_modules/.bin/jest --ci --logHeapUsage --maxWorkers 1 --runner ./jest-runner-parallel.ts --shard 1/4

Ideally, either #13171 or #13168 would get merged (see #13792) so that workarounds like those discussed here would be unnecessary.

I doubt they ever will - nobody chimed in when bot marked them as stale, so both are closed now.

Iā€™m facing a similar issue of significantly increased memory during tests with node@18.13 it went up from ~800MB per 30+ test files, to 4000MB

Weā€™re on a NX monorepo and using yarn@3.3.1 and jest@29

an important difference between node@16 and node@18 is that there are way more modules imported => require requireModuleOrMock calls for already existing/loaded modules

That was visible in the chrome Allocation sampling memory profiling toolā€¦

If the jest maintainer need so - I can prepare a reproduction repository but that would eat up a lot of my time for today and Iā€™m not itā€™s worth it at allā€¦

this is also useful

@phawxby updated it

Unfortunately --no-compilation-cache does not do the trick for us, still getting memory issues with jest@29.2.2 and node 18.10

Itā€™s not a solution for jest itself, but Iā€™ve written alternative runner, which uses posix fork(), and it does not have memory leaks by design. Itā€™s very POC now, have some limitations. https://github.com/goloveychuk/fastest-jest-runner

Personally I use it like this : jest --config jest.base.config.mjs --logHeapUsage --workerIdleMemoryLimit='800MB'

But first you have to set jest@29.0.0-alpha.5

As for forcing single worker rather than in band - not sure about that one. At some point I think we should limit how much gnarly code we add solely to work around a bug in v8ā€¦

Iā€™m curious about the design decision in jest: why is run in band not just a single worker by default? Is there a decision record somewhere you can point me in the direction of, so I can level up on jest internal knowledge?

I understand the sentiment regarding gnarly code, but see it from ā€œour sideā€. Weā€™re using jest not v8, and weā€™re having a big problem šŸ˜ƒ Our test suites donā€™t work (along with a lot of otherā€™s). Also that particular v8 issue seems to be closed as wont fix, so one could be inclined to say that itā€™s now a feature of v8ā€¦ šŸ¤·

Please consider your jest users. Weā€™ve spent (wasted?) over 40 hours in the team now trying to circumvent a failing suite rather than creating features for the business. That PR solves it.

We had consistent out of memory errors on node 16.13, reverting to 16.10 solved the problem for us.

Please can I ask if the resolution will be backported to Node.js 18?

I donā€™t think jest is the place where that gets decided

Iā€™ve verified the new version of Node doesnā€™t help with the bug in this issue. See my post here: https://github.com/nodejs/node/pull/49932#issuecomment-1740542932

Hey @M-a-c-Carter, did you try the latest nightly node v21? It seems, the change that might improve things is not yet backported to v20, and probably wonā€™t be.

I think thatā€™s still https://bugs.chromium.org/p/v8/issues/detail?id=10284. It just received a comment saying things might help now. However, see https://github.com/nodejs/node/pull/48510#issuecomment-1717670338

I donā€™t know if this PR affects #35375 at all?

I donā€™t think it does, the underlying issues (V8 not handling code caching + host defined options) are still unfixed in the upstream and they are orthogonal to this PR (this does not affect code caching in any way).

Code caching is the main problem reported in this issue. Of course with more than 100 participants here, that has probably been conflated with other memory issues, but that regression between 16.10 and 16.11 is as of yet not fixed upstream.

@kibertoad for small projects, yes. Iā€™m trying to migrate a large project to vitest and itā€™s a total pain (and vitest is not fast as it claims - many people have reported it being slower than Jest/SWC and I also witnessed a 40%+ time increase in test execution). But still worth a try.

Since I spent far too long on this, I thought Iā€™d post for anyone else still struggling with this.

If youā€™re using a monorepository like nrwl/nx and find that ā€œworkerIdleMemoryLimitā€ still doesnā€™t work for you:

Try a dumb value like "workerIdleMemoryLimit": "ROFL", if it doesnā€™t completely fail/exit, the config value isnā€™t being used. I was trying to use it in the root jest.config.ts which isnā€™t used at all and needed to put it inside jest.preset.js to have each library config inherit it. Or you can place it into each individual library/app jest.config.ts.

After setting it to ā€œ1Gā€ I could see the memory dropping after exceeding 1G by running the following and testing it:

npx nx test api --expose-gc --logHeapUsage --runInBand

Did the problem got fixed in node v18? Is there a bug on node repo to track this?

No, nor in node 20.

Relevant node issues

Posting this incase anyone else finds this useful, my company has a pretty large frontend monorepo where we shard our Jest tests across a number of CI boxes and each CI box has 16 CPU cores.

We found that after upgrading from Node.JS v16.10.0 to v16.20.0 that our Jest tests started timing out in CI. We were able to resolve the issue by changing the number of Jest workers spun up from --maxWorkers=4 to --maxWorkers=3.

This likely resolved the issue because we stopped overloading CI box CPU ā€“ prior to the change we would spin up up to 4 jest parent processes per box, resulting in a total of 20 processes being spun up at one time (4 parent processes + 4 workers per parent * 4) --> By reducing maxWorkers to 3 the maximum total number of jest processes running on any give box is now 16 and the timeouts seem to have been resolved.

I donā€™t know if could maybe help someone here, but I had the same problem and recently solved it.

I had a test:unit script in my projectā€™s package.json that run: jest --detectOpenHandles --logHeapUsage --colors -c jest.config.js.

My 195 tests consumed a total of about 1800MB heap size (starting from 200MB) and took more than 30s to complete.

The solution in my case was just removing the --detectOpenHandles option and I can now run my tests within 10s with max 580MB by test (no more incremental consumption).

node: 18.12.1
npm: 8.19.2
jest: 28.0.0

@SimenB --workerIdleMemoryLimit option is not working when --detectOpenHandles option is enabled

@Kle0s setting maxWorkers:1 doesnā€™t work, because jest automatically detects that and runs inband. so you need the runner override (this.isSerial = false;) from @a88zach to get it working.

I went through all the messages here and tried their suggestions. I started out with much slower testing than before I upgraded Jest and a crash at the given memory limit. But with this command line things are back to normal (full speed, heap mem < 1GB, 200K+ LOC, Typescript vite app):

node --no-compilation-cache --expose-gc --max_old_space_size=4096 node_modules/.bin/jest --logHeapUsage --coverage --watchAll=false

But I had to use both --no-compilation-cache and --expose-gc to make it work. Node version is 19.3.0 and Jest version is 29.3.1.

Memory running rampant after a while is something I even run into with 16.10 to some degree. The only way Iā€™ve found to prevent this build-up is by finding a good combination of workers & memory limit. The sweet spot on my machine (M1 w/ 16gb) is running our 9k tests with 6 workers and --max-old-space-size=768, which now gives me a rock-solid 140-145s execution time. Any more workers or memory and I get memory pressure and very unstable execution times, any less workers or memory and execution time goes up considerably. Without memory limit Jest is simply unusable at this point.

The same approach seems to work when I run it on the latest node (18.6), however, execution times increase to 175-180s.

I also noticed that a considerable part of the memory is used to cache hundreds of copies of identical external import sources (with react-dom sticking out quite a bit). Not sure this is normal behavior.

Sadly the above runtime didnā€™t solve the issue for us. This is 16.10.0 compared against 18.4.0

image
Node 16.
Test Suites: 850 passed, 850 total
Tests:       1 skipped, 6677 passed, 6678 total
Snapshots:   1895 passed, 1895 total
Time:        259.199 s
Ran all test suites.

Node 18.
Too long with no output (exceeded 15m0s): context deadline exceeded

Sadly I think if we canā€™t find a proper solution to this we may have to drop jest, this essentially has us pegged at Node 16.10.0 which isnā€™t really a viable position to be in long term.

I went through the commits and looked for code relating to node as opposed to the v8 upgrade and I cannot find anything. I doubt this will get fixed unless its someone on the node or v8 team helping.

I also tried the class static initialisers from the blogpost (https://v8.dev/blog/v8-release-94) and also no luck in finding a cause.

node 18 + jest 27 works well (few leaks) node 18 + jest 28 a lot of leaks

@SimenB would you accept a PR to lower v16 in engines to 16.10?

Yes

@SimenB would you accept a PR to lower v16 in engines to 16.10?

@alumni I would rather not write a test runner - I feel like there would be problems with changedSince and watch mode and also the performance would depend on how many files each spec file imported - for some projects your approach wonā€™t be slower, for others (like mine) I suspect it would.

I have performed some investigation relating to the bug noted above in Node and couldnā€™t observe any noticable difference between Node 16.10.X and node 16.11.X, and that specific issue existed back to Node V14.5.0. So while it is likely to be causing an issue in Jest I donā€™t think itā€™s quite as related as we think.

@mb21 just looked, my current workaround for Node > 16.10 itā€™s to tweak max-old-space-size, with your reproduce:

Node 16.13.1

./node_modules/jest/bin/jest.js --runInBand --logHeapUsage
PASS  src/a72.test.js (253 MB heap size)

Test Suites: 101 passed, 101 total
Tests:       101 passed, 101 total
Snapshots:   0 total
Time:        9.174 s, estimated 12 s

https://gist.github.com/pustovalov/c98f3f521943aa082a2edbc70a343302#file-1-log

node --expose-gc --max-old-space-size=50 ./node_modules/jest/bin/jest.js --runInBand --logHeapUsage
 PASS  src/a75.test.js (25 MB heap size)

Test Suites: 101 passed, 101 total
Tests:       101 passed, 101 total
Snapshots:   0 total
Time:        14.831 s

https://gist.github.com/pustovalov/c98f3f521943aa082a2edbc70a343302#file-2-log

In my case https://github.com/facebook/jest/issues/11956#issuecomment-969041221, limit in 700mb works fine for:

Test Suites: 150 passed, 150 total
Tests:       2159 passed, 2159 total
Snapshots:   259 passed, 259 total
Time:        332.784 s
vCPUs 2
RAM 4GB

Also the execution time increased

Node v14.17.3

Time: 240.665 s
Time: 314.481 s
Time: 270.317 s
Time: 240.255 s
Time: 335.224 s
Time: 239.925 s
Time: 260.644 s
Time: 334.419 s

Node v16.13.0

Time: 321.699 s
Time: 318.885 s
Time: 355.484 s
Time: 318.719 s
Time: 406.289 s

If itā€™s specifically in 16.11, you can probably try to build node yourself and bisect https://github.com/nodejs/node/compare/v16.10.0..v16.11.0. Figuring out which commit that introduced it might help understand when one (or more) of your code, Node and Jest does wrong šŸ™‚

Iā€™ve tried that. I was able to reproduce the issue on the last commit on v8 upgrade. It looks like the problem is related to v8 upgrade inside of node.

Weā€™re also unable to update to the LTS version of Node 16 (at the time of writing 16.13.0) because of this issue. We bisected the changes and identified that the upgrade from Node 16.10 to 16.11 caused our large Jest suite to hang indefinitely.

I took a look at the Node 16.11 changelog and I think the most likely culprit for this issue comes from the V8 update to 9.4 (PR). In V8 9.4, the new Sparkplug compiler is enabled by default (see also this Node issue).

I was hoping I could try disabling sparkplug to see verify that this is the issue. node exposes a V8 option to disable (--no-sparkplug), but I donā€™t think itā€™s passing through to the Jest workers when I call it like this:

node --no-sparkplug node_modules/.bin/jest

I also tried setting the V8 option in jest-environment-node here: https://github.com/facebook/jest/blob/42b020f2931ac04820521cc8037b7c430eb2fa2f/packages/jest-environment-node/src/index.ts#L109 via

setFlagsFromString('--no-sparkplug');

but I didnā€™t see any change. Iā€™m not sure if that means Sparkplug isnā€™t causing the problem or if Iā€™m not setting the V8 flag properly in the jest workers.

@SimenB - I see youā€™ve committed a good deal to jest-environment-node - any tips for how I might pass that V8 flag down through all the workers? If itā€™s possible (even via patch-pacakge or something) Iā€™d be happy to give it a shot on our test suite thatā€™s exhibiting the problem.

So Iā€™m not exactly positive that this is the cause of the issue, but it seems like a potentially promising place to start looking.

Hi, Iā€™m having the same problem but as far as I understand, version 20.11.0 of node should have this fixed?

This addressed our issue as well with Jest constantly using more memory between tests. Unfortunately could not use workerIdleMemoryLimit setting as our runners are using Linux where memory isnā€™t reported correctly.

Iā€™m curious as to what the fix is, Iā€™ve looked at the release and cannot find the commit that addresses this. It should be way more notable in the release notes that this was fixed

https://nodejs.org/en/blog/release/v21.1.0

Either way, very thankful this has been addressed after 2 years. Hoping for a backport to 20 and 18. We need this fix in an LTS version.

Thatā€™s number 4 here https://github.com/jestjs/jest/pull/12205#issuecomment-1749113564. Tl;dr - upstream issue in V8, a standalone reproduction using just node apiā€™s is probably needed before people working on node dig into it.

But that is orthogonal to this issue which is about the regression in same code from node 16.10 to 16.11.

node 20.8 is released now with a fix for the memory leak. i didnā€™t check it in detail to know if jest also have to change anything

Now that https://github.com/nodejs/node/pull/48510 has just been closed. On what Node version this issue will be fix?

Can anyone confirm if this problem persists using Node 18.15.0?

We are trying to update to node 18 and tried the workerIdleMemoryLimit but didnā€™t seem to make a difference in our case. Still a good 30% increase in overall test execution and increased flaky test failures.

We donā€™t want to push back our upgrade to node 18 much longer so we started looking at alternative test runners. Might pin to node 16.10 to run tests, but itā€™s not a viable long term solution.

Iā€™m trying to go over the various other work around posted in this thread and will post back if I find one suitable for our scenario.

The transpiration is not the cause as I tried to compile the tests and run the compiled ones but it didnā€™t help

for github actions you need to set workerIdleMemoryLimit toi lower than 2GB, because there is a strange heap limit of 2GB

I can also confirm that adding a specified workerIdleMemoryLimit solves the issue for our test suite! However, I noticed that using relative units did not work for our case. Weā€™re running on a CircleCI machine with 16 GB RAM and got the following results: Edit: Forgot to mention: weā€™re using 8 workers on a 8 CPU machine

0.15: āŒ times out after 10 minutes, 100% RAM usage recorded ā€œ10%ā€: āŒ times out after 10 minutes, 100% RAM usage recorded ā€œ200MBā€: āœ… after 8 minutes ā€œ800MBā€: āœ… after 4 minutes ā€œ1600MBā€: āœ… after 2 minutes (peak RAM usage: 64%)

Iā€™d expect to get similar results using 1600 MB and 10%, if my Math is correct, so maybe somethingā€™s off with the calculation logic (my tests are not scientific though, just single data points I took)?

But anyway, I appreciate you adding this new flag very much! It finally gives us a way to work around the memory issue, which unblocks us from upgrading NodeJS šŸ™‡ā€ā™‚ļø

@alaughlin - see https://github.com/facebook/jest/pull/12205#discussion_r815307518

Looks like a dead end so far on supporting ESM with this custom runtime

@a88zach - thank you for this! The runner youā€™ve published enabled us to upgrade past Node 16.10 with minimal memory/runtime impact.

You could use --shard and do the parallelization yourself rather than letting jest handle it. We do something like this with v27 (we spawn parallel jest processes that only handle a small amount of tests) and itā€™s much much faster and has a much smaller memory usage. I donā€™t have exact numbers, but we get OOM after 30-40 min (only about half of the tests have been run) if we run the entire test suite, while with this approach weā€™re done in 7-8 min.

This is what I used to do my tests based of #12205 jest-runtime+27.4.2.patch

FWIW, applying this patch makes itā€™s possible to us to run jest setup on node 16.14.2. Heap size stays ~200MB per test but without that patch they are going over 2 GB. Which makes its impossible to run the test setup.

Our test setup is testing Angular app so it may be that the patch is only making difference in certain kind of setups but not with all.

Full test setup run:

  • v14.19.1: ~90 s
  • 16.14.2 (with patch): ~115 s

Since jest is not directly tied to node.js version Iā€™m sure there is a huge number of various configurations out there and updating jest doesnā€™t necessarily happen at the same time you update node. That being said and mainly because the new Script vs compileFunction has already gone back-and-forth I would propose to at least keep the code for both under a flag/setting. It would allow to more easily switch between the 2 strategies to determine which one is best suited for the use case.

compileFunction should probably be the default setting since it is technically better in latest node version, but might not stay like this since it looks more like a bug on node/v8 part.

Personally, running our test suite takes a long time and thus Iā€™ll aim to keep the fastest configuration as long as I can. I can always patch-package it if needed (Iā€™ve done it before).

Thanks for the numbers, @Cellule.

BUT 16.10 with new Script was the fastest configuration so it still results in a heavy performance reduction.

I imagine ā€œmostā€ 16.x will want to be on latest and not 16.10.

Based on the numbers and considering that compileFunction should not be a breaking change (except not being able to implement importModuleDynamically for esm, see skipped test), it seems reasonable to mainline the proposed change. WDYT?

Looks like I am hitting this as well. It happens only when doing coverage (as it eats more memory), and it ends up with hard failure. Locking node version to 16.10 helps, node 17 fails the same. Fails locally too.

Maybe one interesting note, it started to happen after I upgraded node-mongodb to v4, failed pipeline here.

@blimmer nothing in particular is notable about our codebase. Itā€™s not a small codebase, but not very big either. Prior to 16.11, there were known memory leaks when running test suite (potentially around lodash); maybe made worse by 16.11.

What is the ā€œ-iā€ parameter for?

https://jestjs.io/docs/cli#--runinband

dā€™oh! Just realized GitHub is auto-referencing my commit notes pointing to many of yā€™allā€™s advice here. Sorry! I did not mean to clutter this long discussion.

So that I contribute something useful, here is what I was able to use to get node 18 + Jest 29 working with GitHub Actions:

node version: 18.16.0 jest version: 29.7.0

node 18 flags:

--no-compilation-cache
--max-old-space-size=2048

jest arguments:

--workerIdleMemoryLimit='1024MB'

Iā€™m not seeing a huge slow down in performance, but our tests were a slog even before we switched to node 18 from v14.

Hi, With the latest fixes happened in node it fixes the memory issue but the perf issue still remains. Is that understanding correct? If yes any idea about plans for perf fix and who is supposed to make that fix node or chromium?

There goes @SimenB bursting our bubbles again šŸ˜„. So realistically, does that mean that thereā€™s not really anything in an upstream pipeline that could alleviate this issue?

I doubt that PR will help much for people not using --experimental-vm-modules. It only deals with native ESM leaking - it doesnā€™t do anything with code caching (which is the main issue here).

That said, Iā€™d be happy to be proven wrong! Itā€™ll be available as a nightly tomorrow if people want to test it: https://nodejs.org/download/nightly/

For people on arm64 macs (m1 or m2), you can use the binary linked from nodejs/node#48510 (comment) if you donā€™t wanna wait ~15 hours for the nightly to be built.

I tried using brew (brew install node --HEAD) it does not seem to fix the issue vs node 16.10 (~20s in 16.10 vs ~60s with the latest commit 2dfaee4bc8 of node). We do not use experimental-vm-modules.

I doubt that PR will help much for people not using --experimental-vm-modules. It only deals with native ESM leaking - it doesnā€™t do anything with code caching (which is the main issue here).

That said, Iā€™d be happy to be proven wrong! Itā€™ll be available as a nightly tomorrow if people want to test it: https://nodejs.org/download/nightly/

For people on arm64 macs (m1 or m2), you can use the binary linked from https://github.com/nodejs/node/pull/48510#issuecomment-1717670338 if you donā€™t wanna wait ~15 hours for the nightly to be built.

This solution fixes the memory issue but the speed is still 2x slower. This is blocked us on node 16.10 forever. is there any solution to it

https://github.com/reside-eng/jest-runtime has fixed this issue for us. We no longer need --workerIdleLimit=2GB to keep our CI containers from crashing, and memory usage is flat instead of constantly growing. Surprised not to see this already mentioned in this thread, I must have found it in one of the many other related ones.

Notably, --expose-gc --no-compilation-cache did not fix the runaway memory usage for us.

Same here, using node --expose-gc --no-compilation-cache fixes the issue.

Even with using workerIdleMemoryLimit our tests are taking long times. We are not able to update node beyond of 16.10.0 . This has become a big blocker for us. Is there any fix for this?

@sunilsurana

This is a node issue so nothing that Jest can do. The contributors for node did try a fix but had to revert the change due to an issue.

Have you tried running your jest with the following flags: node --expose-gc --no-compilation-cache ./node_modules/jest/bin/jest.js <any other jest flags that you use here>

We found this reduced our maximum heap size from 3.6GB to 0.8GB

Nice!! This -no-compilation-cache fix it for me. I was having a heap size over 2gb, with every test leaking memory to the next. With this config the leakage stopped and the final heap usage was 200mb. And no effect on speed for me also. Tks so much!

Hi @GBPROB2 , With this we did see memory reduction but the time it took to execute test suite is still degraded compared to 16.10.0 (3x more ) Iā€™ll look into the thread for fix in node

In a complex codebase using Angular and ESM modules in Node 16.18.1, the only alternative that brought memory relief for me was:

  1. Use Jest 29
  2. Combine it with workerIdleMemoryLimit (I set it to 500MB)

Unfortunately, all the other possibilities (isolateModules, --no-compilation-cache, etc.) didnā€™t work for me. I am still fixing some 60 tests that broke because of the upgrade, but the above technique eliminated the pesky OOM issues for a 4,600+ test suite previously running Jest 27.5.1.

anybody tested with node 20 ?

I was noticing that GitHub Actions was failing with OOM errors even before the test output starting logging (Jest + ts-jest + Vue SFCs). For me, the issue actually seemed to be related to the default babel code coverage provider (see https://github.com/jestjs/jest/issues/5837). Switching to v8 resolved the OOM issue:

coverageProvider: 'v8'

https://jestjs.io/docs/configuration#coverageprovider-string

Due to potential security risk if pinning down to Node 16.10, our team decides to move on with Node 18.x with a workaround: Split test cases into multiple waves, using testPathPattern and testPathIgnorePatterns options. Partition by folder. There would be fewer test cases in each wave so it can finish before running out of memory.

Test cases in each wave can be a parallel run, although only one wave is running at a time. This takes 50% more time to run, but still 2x faster than the workaround above (all sequential).

FWIW I used -w=4 --workerIdleMemoryLimit=1.5G on a CircleCIā€™s large docker runner, this works fine for us. large on CircleCI means 4CPU and 8GB of memory.

So far, Iā€™ve had no issues when running in Github with the custom runner I posted above. One thing I noticed is that we have the following set in our jest.config.js

...(process.env.CI === 'true' && {
    maxWorkers: 2,
  }),

We have this set because the Github runners only have 2 CPUs and jest defaults to the number of the cores available on your machine minus one for the main thread.

So maybe you need to force the use of 2 cores, even though the custom runner only uses one core super({ ..._globalConfig, maxWorkers: 1 }, _context);

Thank you @a88zach, your explanation makes a lot of sense. I applied the suggestion and it worked for me.

I used @a88zachā€™s solution and while the rate of the process crashing has went down significantly, it still happens occasionally. Since it occurs in CI during the tests stage, it happened a few times out of ~50 runs today.

Iā€™m still hoping someone at facebook will take care of that or something šŸ˜•

@a88zach your solution is working perfectly for me. I can finally switch to node@18. Thanks for sharing šŸ‘

I tried this solution now @a88zach and it did not work for me. I will state however, that I simply set the maxWorkers to 1 in the jest.config.js file. I did check if it was properly parsed by passing to jest the --debug switch. In the debug print I got:

{
    ...
    "maxWorkers": 1,
    "workerIdleMemoryLimit": 1000000000,
    ...

I still get the following error in the CI:

<--- Last few GCs --->

[4125:0x512ec60]   627709 ms: Scavenge 2025.6 (2070.5) -> 2024.5 (2080.5) MB, 16.1 / 0.1 ms  (average mu = 0.697, current mu = 0.649) allocation failure 
[4125:0x512ec60]   629104 ms: Mark-sweep (reduce) 2045.0 (2100.6) -> 2026.0 (2080.4) MB, 1289.3 / 0.1 ms  (+ 8.1 ms in 5 steps since start of marking, biggest step 3.4 ms, walltime since start of marking 1356 ms) (average mu = 0.505, current mu = 0.234) a

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb09980 node::Abort() [node]
 2: 0xa1c235 node::FatalError(char const*, char const*) [node]
 3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xeaf3d5  [node]
 6: 0xebf09d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 7: 0xec1d9e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 8: 0xe83012 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
 9: 0xe7b624 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [node]
10: 0xe7d681 v8::internal::FactoryBase<v8::internal::Factory>::NewRawTwoByteString(int, v8::internal::AllocationType) [node]
11: 0x110e755 v8::internal::String::SlowFlatten(v8::internal::Isolate*, v8::internal::Handle<v8::internal::ConsString>, v8::internal::AllocationType) [node]
12: 0x1169626 v8::internal::ScannerStream::For(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, int, int) [node]
13: 0x114c99a v8::internal::parsing::ParseProgram(v8::internal::ParseInfo*, v8::internal::Handle<v8::internal::Script>, v8::internal::MaybeHandle<v8::internal::ScopeInfo>, v8::internal::Isolate*, v8::internal::parsing::ReportStatisticsMode) [node]
14: 0xd94dd8  [node]
15: 0xd95008  [node]
16: 0xd9529c v8::internal::Compiler::GetSharedFunctionInfoForScript(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, v8::internal::ScriptDetails const&, v8::Extension*, v8::internal::ScriptData*, v8::ScriptCompiler::CompileOptions, v8::ScriptCompiler::NoCacheReason, v8::internal::NativesFlag) [node]
17: 0xd15141  [node]
18: 0xd1544b v8::ScriptCompiler::CompileUnboundScript(v8::Isolate*, v8::ScriptCompiler::Source*, v8::ScriptCompiler::CompileOptions, v8::ScriptCompiler::NoCacheReason) [node]
19: 0xafb543 node::contextify::ContextifyScript::New(v8::FunctionCallbackInfo<v8::Value> const&) [node]
20: 0xd54bae  [node]
21: 0xd55177 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
22: 0x15f0b79  [node]
Aborted (core dumped)

@SimenB Iā€™m not sure if as a maintainer youā€™re able to edit the original issue. If not perhaps an alternative plan is to close and lock this issue so my comment remains at the bottom as the workaround and we open a new issue referencing this one to continue tracking the problem. Hopefully itā€™ll stop people wasting effort on workarounds that already exist and limit focus to a solution to the underlying issue (which I believe to be a V8 vm issue anyway).

@ckcr4lyf Iā€™m honestly not sure where the issue(s) might lie. Iā€™ve not really used the node inspector before or spent any time digging around in node heap profiles, so Iā€™m not sure if the conclusions Iā€™m drawing are reasonable, but Iā€™ve posted several updates in the ts-jest thread I mentioned earlier, but Iā€™ll summarise here.

I spent most of today capturing heap profiles and delving into the code of various dependencies. The pattern Iā€™m seeing is an ~180MB heap allocation happening every time jest advances a test suite (all testing is being done with --runInBand). Within these 180MB heap allocations, ~100MB of that seems to be strings which look like node module source/(or maybe source maps?). 100MB is coincidently roughly the amount Iā€™m seeing the heap grow by for every test suite that runs.

image

These strings looked like they were in an the retrieveMapHanders array within source-map-support being retained by Error.prepareStackTrace. Within source-map-support I see an implementation of prepareStackTrace which gets monkey patched onto Error within install.

Checking my yarn.lock for dependencies on source-map-support, the only dependency I found was jest-runner. Sure enough, I found that runTest.ts calls install from support-source-map (twice)

  // For tests
  runtime
    .requireInternalModule<typeof import('source-map-support')>(
      require.resolve('source-map-support'),
    )
    .install(sourcemapOptions);

  // For runtime errors
  sourcemapSupport.install(sourcemapOptions);

later in runTestInternal I see the cleanup function from source-map-support (resetRetrieveHandlers) is called, but only once for the ā€œlocalā€ import of sourcemapSupport.

I commented these lines in my local runTest.js and reprofiled, and whilst it didnā€™t reduce the memory usage by much, the retaining objects on the strings I saw were now applicationinsights diagnosticchannel monkey patching.

I stubbed out the couple of places I was importing ā€œapplicationinsightsā€ and reprofiled and found that again, whilst the memory usage hadnā€™t changed much, the strings were now being retained by closures from Node:timers due to node-cache which is another dependency I have.

image

In node-cache _checkData I can see a timeout is being set with a recursive callback to itself. This is the function shown in the heap stack. There is a _killCheckPeriod which the comment mentions is to clear the timeout so for testing purposes, and this _killCheckPeriod is called in two places, close and flushall.

In my tests I am calling close in afterAll in my setup filed specified in setupFilesAfterEnv, so I would have hoped that would be unsetting the timeoutā€¦

In all instances, I can see in the stack trace that the array containing the strings is ultimately either _moduleRegistry of _cacheFS both of which are on the jest-runtime Runtime object. jest-runner calls tearDownEnv which in turn calls teardown on the Runtime. This calls clear on the various internal maps and arrays like _moduleRegistry and _cacheFS.

Iā€™m not familiar with node (or jest) internals to know what conclusion can be drawn from all this, or where the issue really lies. Iā€™m not sure if these allocations represent the ā€œleakā€ that Iā€™m seeing in terms of heap usage growing, but if they are, then my assumption is that modules within the test runtime are being retained after tearDownEnv is called and that it is these retained modules that represent the heap growth. If this is the case the leak is proportional to the size and number of imports a test has, and so will grow with every test that has any such import that retains a reference to _moduleRegistry/_cacheFS.

In my case, I am using supertest to test an express app, which means my dependencies are always the same, and always include the various offenders (applicationinsights, node-cache etc), which would explain why the leak Iā€™m seeing seems to scale linearly with number of tests.

Next thing I wanted to try was removing use of node-cache and reprofiling to see how that looks.

Immediately tried @brokenmass 's solution above andā€¦ it gave amazing results!

I needed to modify the import like this (otherwise it gave an error):

const {TestEnvironment} = require('jest-environment-node');
class TestEnvNode extends TestEnvironment {

Previously, my ~2000 tests on Node 16.10 took 20 seconds. With the workaround I was able to run it on Node 18.21.1 in just 30 seconds (with memory limit ~800Mb, which was optimal after few tries in the range of 500Mb to 1Gb).

Without the workaround on the same Node 18.21.1 it takesā€¦ 85 seconds. (and makes my machine almost unusable, taking all the memory, swap, etc).

Thanks!

This elegant fix may unlock us to actually go beyond Node 16.10 šŸš€

@SebTVisage you need to run my fix along with --logHeapUsage but without --runInBand to provide any useful diagnostics. --runInBand prevents workerIdleMemoryLimit from doing anything.

Thank you very much for your investigation and @phawxby šŸ™.

I tried it with different values from 250MB to 1500MB (running in a Github actions), but unfortunately it doesnā€™t work for our monorepo. It does work locally. But before migrating to node 16 (we were blocked on node 12) everything was fine on the CI as well.

The only think I can think of is that weā€™re using ts-jest, and in memory mongo to spin up a mongo instance per test file. Do you see any reason why your workaround wouldnā€™t work with this?

@phawxby - thanks for the link to that Node issue, Iā€™ll have a look and subscribe. Out of curiosity, are you using Datadogā€™s dd-trace library in your application? I think thereā€™s (yet) another memory leak issue there: DataDog/dd-trace-js#2297.

Nope, not in our project. I did spend some time when I initially started investigating this trying to narrow down which modules were causing the issue but with no luck.

Ideally weā€™d use compileFunction, we went back to Script due to other performance issues with it šŸ˜…

But yes, this is really a bug in Node (or rather, v8) that weā€™re trying to work around up until itā€™s fixed upstream

I tested the workerIdleMemoryLimit and itā€™s at least usable with newer versions of Node.

However, it doesnā€™t feel like it really solves the problem. Using the runner published as part of this PR: https://github.com/facebook/jest/pull/12205#issuecomment-1150255655 doesnā€™t require worker restarts because it fixes the memory leak altogether in our application.

When running with --logHeapUsage and side@jest-runtime (https://github.com/facebook/jest/pull/12205) I see the memory usage hovering right around 200-300 megabytes for all tests.

However, with the alpha.5 version, I see the heap grow until the worker is restarted at the workerIdleMemoryLimit value I set.

Thatā€™s the behavior I expected from that PR and new setting, but I wanted to call out that I think weā€™ve got the real fix in #12205.

yes, sorry. my fault. šŸ™ˆ

For us too

but for @jest/test-result only v29.0.0-alpha.4 is available, so it pulls some other jest v29.0.0-alpha.4 deps

I confirm the latest alpha version works flawlessly for us! Thanks for the fix @phawxby @SimenB

Might be totalmem() not working on circle ci. Itā€™s notoriously hard to get proper available hw there (e.g. cpus() is also completely broken, reporting 64 when thereā€™s 2 or 4). I put together https://github.com/SimenB/available-cpu at some point to see, maybe something for memory should be done as well?

Our team also began hitting this when attempting to upgrade from Node v14 to Node v16.14.2 in https://github.com/microsoft/accessibility-insights-web/pull/5343. @karanbirsingh recorded some experiments with different node/v8 settings in this comment.

Our repo had previously been using --max-old-space-size=16384 with our Jest tests (an old workround for an unrelated issue in Azure DevOps which we think is obsolete). This seemed to exacerbate the issue significantly; we found that removing this tripled the speed of our unit test suite (though it still didnā€™t get it down to Node <=16.10.0 speeds).

My findings do not match the node bug also: https://github.com/nodejs/node/issues/40014 so I 100% agree with @phawxbyā€™s comment here: https://github.com/facebook/jest/issues/11956#issuecomment-1077523582

something went in between v16.10.0 and v16.11.0 which broke jest and although the node bug seems related, if it is, it must be that something now makes the bug trigger in v16.11.0

v14.4.0 is ok v14.5.0 is ok v14.19.0 is ok

v16.10.0 is ok v16.11.0 is broken

v18.0.0 is broken.

I just discovered jest now requires node 16.13 - so I cannot stay on node 16.10 to avoid this bug any more šŸ˜•

For the dummy tests from the examples above I actually got a slight increase in time. Without using the 2 node flags I got time ~3.8s, with the flags I got ~4.7s. But for the actual tests from my codebase ( 36 test suits 178 tests ) the time has actually decreased from ~65s to ~46s. Tested locally on mac with node v16.11.0, "jest": "^27.3.1" and "ts-jest": "^27.0.7",

it looks to me like the v8 bug was closed as wonā€™t fix but the nodejs bug is still open

@lukeapage thanks for pointing out the difference šŸ‘Œ , I lost the details in the way and thought that was not being investigated anymore

can you try to run jest in this way?

@pustovalov these are the results from running:

node \
  --expose-gc \
  --trace_gc \
  --max-old-space-size=700 \
  ./node_modules/.bin/jest \
    --runInBand \
    --logHeapUsage \
    > jest_memory_usage.log

stdout:

PASS __tests__/routes/v3/sql/query/bigquery.test.ts (6.814 s, 221 MB heap size)
PASS __tests__/routes/v3/sql/query/jobs/bigquery.test.ts (282 MB heap size)
PASS __tests__/routes/v3/handlers/clients/bigquery.test.ts (327 MB heap size)
...more test lines...
PASS __tests__/routes/v3/maps/query/databricks.test.ts (624 MB heap size)
PASS __tests__/routes/v3/sql/query/snowflake.test.ts (646 MB heap size)

<--- Last few GCs --->

[30209:0x7f96af900000]    45289 ms: Mark-sweep 665.4 (738.9) -> 650.5 (739.4) MB, 468.1 / 0.4 ms  (average mu = 0.377, current mu = 0.374) allocation failure scavenge might not succeed
[30209:0x7f96af900000]    46075 ms: Mark-sweep 670.6 (741.3) -> 656.8 (744.0) MB, 600.1 / 0.2 ms  (average mu = 0.306, current mu = 0.237) allocation failure scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x101600515 node::Abort() (.cold.1) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 2: 0x100301989 node::Abort() [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 3: 0x100301aff node::OnFatalError(char const*, char const*) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 4: 0x1004812c7 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 5: 0x100481263 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 6: 0x100622975 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 7: 0x1006269bd v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 8: 0x10062329d v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
 9: 0x1006207bd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
10: 0x10062daf0 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
11: 0x10062db71 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
12: 0x1005f4c1d v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
13: 0x1008cf8c1 v8::internal::String::SlowFlatten(v8::internal::Isolate*, v8::internal::Handle<v8::internal::ConsString>, v8::internal::AllocationType) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
14: 0x100765374 v8::internal::CompilationCacheTable::LookupScript(v8::internal::Handle<v8::internal::CompilationCacheTable>, v8::internal::Handle<v8::internal::String>, v8::internal::LanguageMode, v8::internal::Isolate*) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
15: 0x10051a42f v8::internal::CompilationCacheScript::Lookup(v8::internal::Handle<v8::internal::String>, v8::internal::ScriptDetails const&, v8::internal::LanguageMode) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
16: 0x100523cda v8::internal::Compiler::GetSharedFunctionInfoForScript(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, v8::internal::ScriptDetails const&, v8::Extension*, v8::internal::ScriptData*, v8::ScriptCompiler::CompileOptions, v8::ScriptCompiler::NoCacheReason, v8::internal::NativesFlag) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
17: 0x10048b5ab v8::ScriptCompiler::CompileUnboundInternal(v8::Isolate*, v8::ScriptCompiler::Source*, v8::ScriptCompiler::CompileOptions, v8::ScriptCompiler::NoCacheReason) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
18: 0x1002f4b45 node::contextify::ContextifyScript::New(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
19: 0x1004e9dd9 v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
20: 0x1004e95ad v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
21: 0x1004e8ffb v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
22: 0x100d59eb9 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
23: 0x100cea072 Builtins_JSBuiltinsConstructStub [/Users/amiedes/.nvm/versions/node/v16.13.0/bin/node]
zsh: abort      node --expose-gc --trace_gc --max-old-space-size=700 ./node_modules/.bin/jest

Memory usage log:

jest_memory_usage.log

Thanks you both a lot for taking a look into it šŸ™‡

EDIT: I ran this with Node 16.13.0

@amiedes can you try to run jest in this way?

node --expose-gc --trace_gc --max-old-space-size=700 ./node_modules/.bin/jest app/javascript --runInBand --logHeapUsage > log/jest_memory_usage.log

@amiedes it looks to me like the v8 bug was closed as wonā€™t fix but the nodejs bug is still open.

Iā€™m just a interested spectator and donā€™t have a 100% understanding but it seems to me like no one has indicated a belief this is a jest bug.