jest: Memory allocation failure when running lots of tests with --runInBand in limited env (Travis)

This is a specific and concrete problem (I think) related to the performance concerns I voiced in #2171.

There’s a recommendation in #1742 and #2162 to use --runInBand when running tests on Travis (or, presumably, other slower environments), in order to avoid memory problems that cause slowdowns and failures. When I use --runInBand on a ton of tests (for stylelint), I do avoid the problems I have without that flag; however, I just end up running into another failure. Apparently Jest consumes all the memory and crashes:

 PASS  src/rules/time-no-imperceptible/__tests__/index.js
<--- Last few GCs --->
  115688 ms: Scavenge 1389.0 (1458.0) -> 1389.0 (1458.0) MB, 0.5 / 0 ms (+ 1.0 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
  117175 ms: Mark-sweep 1389.0 (1458.0) -> 1388.9 (1458.0) MB, 1486.6 / 0 ms (+ 2.4 ms in 2 steps since start of marking, biggest step 1.4 ms) [last resort gc].
  118452 ms: Mark-sweep 1388.9 (1458.0) -> 1344.5 (1458.0) MB, 1277.1 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x3c92d50b4629 <JS Object>
    2: /* anonymous */(aka /* anonymous */) [/home/travis/build/stylelint/stylelint/node_modules/esprima/esprima.js:35] [pc=0x2c59275e252] (this=0x3c92d50041b9 <undefined>,root=0x8ad703dc279 <an Object with map 0x2df09a5073e1>,factory=0x8ad703de6c9 <JS Function (SharedFunctionInfo 0x15effa467661)>)
    3: /* anonymous */ [/home/travis/build/stylelint/stylelint/node_modules/esprima/esprima.js:39...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 5
  • Comments: 38 (19 by maintainers)

Most upvoted comments

As @bertho-zero mentioned, jest --no-cache does fix this, but for me it makes my tests run so much slower, it’s not a great solution.


I have implemented the solution from @azz and @cpojer with success! https://github.com/facebook/jest/issues/2179#issuecomment-355231418

jest.config.js:

  moduleNameMapper: {
    // Jest wires `fs` to `graceful-fs`, which causes a memory leak when
    // `graceful-fs` does `require('fs')`.
    // Ref: https://github.com/facebook/jest/issues/2179#issuecomment-355231418
    "graceful-fs": "<rootDir>/test/helpers/fs.js"
  },

test/helpers/fs.js:

module.exports = require('fs');

Hi all, Just chiming in after doing some more investigation it seems like Runtime is a bit out of control. And the issue might not be related to coverage. Even running a limited suite within the Jest repository quickly exposes this issue.

Using the following command you can attach to the node process and take heap dumps:

node --inspect=5858 ./packages/jest-cli/bin/jest.js runtime --runInBand --logHeapUsage

I’m hoping that someone has more context as to how Runtime comes into play during a test run.

Here are my findings (last column in each screenshot is “Retained Size”):

It looks like there is a Runtime object created for every suite (just making an assumption based on the # of runtime objects). screen shot 2016-12-14 at 10 22 15 am

The _moduleRegistry and _resolver properties in each Runtime grow quickly and are never released. screen shot 2016-12-14 at 10 28 15 am

These properties also contain information about modules that are not relevant to the current suite (not sure if this is by design). screen shot 2016-12-14 at 10 34 00 am

The context in the majority of these objects seems to point to graceful-fs and gracefulify. screen shot 2016-12-14 at 10 30 45 am

This issue seems like a deal breaker for teams that have a large number of test. I’m hoping we can find a path to resolution soon.

Ohhhh, this is pretty interesting actually because when you require('fs') in Jest, you actually get graceful-fs (otherwise you’d run out of file handles pretty quickly at the scale of FB). I think the problem here may be that when you are requiring graceful-fs inside of a test itself, it tries to require fs which is a copy of graceful-fs. Maybe we should alias graceful-fs to the one from the parent context.

Just FYI we had a very similar issue in the Prettier repo, graceful-fs was pulled in via read-pkg-up.

image

This commit works around the problem by aliasing graceful-fs to regular fs: https://github.com/prettier/prettier/pull/3624/commits/9eb3e0ad1c0d1c115c74ce1a0179777a8073c741

Ref: https://github.com/prettier/prettier/pull/3624#issuecomment-354945006

Hmm … nevermind. When running those tests I forgot to remove our jest-setup.js script. When I did remove that script (good news!) the heap size stayed constant at 33MB, every time.

So that’s good news for Jest, suggests that our problem is somewhere in stylelint’s own files or it’s dependencies.

I think I have some evidence that the memory problem is likely within Jest, rather than stylelint, because it looks to me like the same problem occurs with very simple tests.

I created 200 test files, each with 200 test cases, and each test case nothing more than:

describe(`file ${testFileNo}, case ${testCaseNo}`, () => {
  it(`file ${testFileNo}, case ${testCaseNo}, spec`, () => {
    expect(true).toBe(true)
  })
})

Running those tests, I see the same heap growth we saw in stylelint’s tests:

node --expose-gc ./node_modules/.bin/jest --logHeapUsage --runInBand --testPathPattern tmp/tests/.*
 
 PASS  tmp/tests/test-file-189.test.js (47 MB heap size)
 PASS  tmp/tests/test-file-117.test.js (66 MB heap size)
 PASS  tmp/tests/test-file-118.test.js (83 MB heap size)
 PASS  tmp/tests/test-file-119.test.js (101 MB heap size)
 PASS  tmp/tests/test-file-116.test.js (119 MB heap size)
 PASS  tmp/tests/test-file-122.test.js (135 MB heap size)
 PASS  tmp/tests/test-file-123.test.js (153 MB heap size)
 PASS  tmp/tests/test-file-124.test.js (171 MB heap size)
 PASS  tmp/tests/test-file-125.test.js (189 MB heap size)
 PASS  tmp/tests/test-file-126.test.js (206 MB heap size)
...

We have the same thing at FB, actually, so I don’t think your usage of describe/it is a problem.

Wow this is great, thank you so much for the investigation. Here is a bunch more that you should think about:

  • Jest parallelizes tests, so you should potentially try to detect leaks in a single thread using -i, otherwise all the threads will just split the leaking and take longer to crash.
  • What has helped me in the past is to use v8-profiler, capture a heap snapshot at certain times and then load it in chrome and take a look at the retained references. Again, this should be done using -i because the leak currently seems to happen in the worker processes and not in the main process, so if you run it without -i, you’ll only capture the heap of the main process without any test runs. See https://github.com/node-inspector/v8-profiler

From the looks of it it seems like one of two things, both equally likely:

  • You guys aren’t cleaning up after your tests, retaining more memory than you should. This may even be a node bug with the vm module – I’ve had issues with that years ago on a different service I was building at FB. For some reason an “Error” object in a vm context was retaining every single JS module at FB and crashing our service after 100 requests. This was tough to track down.
  • Jest is retaining information or the vm context after a test has completed in a process and the test data has already been sent back.

One thing I thought could be an issue is code coverage – that definitely requires a ton of memory and at FB we had to bump the memory that node can consume. @dmitriiabramov did a fantastic job to rewrite code coverage support but to make it really efficient we may need support from the community to make it more “streamy”: store coverage on the file system and then process them as a stream at the end of a test run. Alternatively it may also be possible to merge coverage information in the main process after each test has finished running. I don’t know how feasible either of them is, though.

At FB we currently have one repo with about 5000 test suites and even with coverage we aren’t running into any big memory leaks which would hint at a leak issue in your code but I’m not entirely convinced and think it may be a Jest bug.