jest: Spurious error: "A worker process has failed to exit gracefully and has been force exited."

🐛 Bug Report

Our CI builds have been failing sporadically with this error from Jest:

“A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --runInBand --detectOpenHandles to find leaks.”

--detectOpenHandles does not detect any problems. The error is reported randomly by various projects that build independently in a large monorepo, which suggests that this is not actually caused by a bad test.

It occurs more frequently on Windows machines, and on machines with heavier load. But it has also been reported on individual developer MacBooks.

To Reproduce

You can trivially force the warning to be reported by reducing this constant to 0:

https://github.com/facebook/jest/blob/64d5983d20a628d68644a3a4cd0f510dc304805a/packages/jest-worker/src/base/BaseWorkerPool.ts#L20

Investigation

The FORCE_EXIT_DELAY constant is used by this code:

https://github.com/facebook/jest/blob/64d5983d20a628d68644a3a4cd0f510dc304805a/packages/jest-worker/src/base/BaseWorkerPool.ts#L104-L116

500 ms seems way too small for a machine that is under heavy load. My theory is that if we increase the timeout, then the IPC message will eventually be received, and the warnings will go away. I will report back after I have tested that.

jest-haste-map may be the root cause

While debugging this, I noticed that the jest-haste-map library has an obvious flaw where it creates a potentially unlimited number of promises while crawling the disk:

https://github.com/facebook/jest/blob/4fa17e455662dc7918eecdb857dd946e9ce2dc53/packages/jest-haste-map/src/index.ts#L659-L678

For example, I observed that Promise.all() may create 50+ promises that are all crawling the filesystem in parallel. If the build orchestrator is building 8 projects in parallel, and each projects is potentially creating 50 Jest promises, that could obviously thrash the disk to the point where the FORCE_EXIT_DELAY limit is exceeded.

This is just a speculation – I did not have time to try fixing this to see whether it resolves the problem or not.

Either way, it would be a good idea to apply some throttling to _buildHasteMap() to limit the parallelism. @cpojer @SimenB

envinfo

  System:
    OS: Windows 10 10.0.19041
    CPU: (8) x64 AMD Ryzen 7 3700X 8-Core Processor
  Binaries:
    Node: 12.20.1 - C:\Program Files\nodejs\node.EXE
    Yarn: 1.22.10 - C:\Program Files\nodejs\yarn.CMD
    npm: 6.14.10 - C:\Program Files\nodejs\npm.CMD

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 9
  • Comments: 25 (2 by maintainers)

Most upvoted comments

Hi, just want to expand on this erorr a bit - I have seen this come up as well when running a medium to large number of tests at once. The reason --detectOpenHandles appears to solve it is because the tests are no longer being run in parallel. The problem is that the memory and/or CPU consumption of Jest is too high when running tests in parallel. It doesn’t happen consistently, but it happens often enough to be a problem.

And this isn’t just a problem on platforms like Jenkins. The resource usage of Jest is an obvious problem if I try to do anything else while my unit tests are running on my dev machine as well.

Hey https://github.com/facebook/jest/pull/13139 might fix this. It sounds like the same symptoms.

Please could you try adding a

setupFilesAfterEnv: ['<rootDir>/jest-setup.js'],

to your config and in jest-setup.js put

afterAll(() => {
  global.gc();
});

and then run jest with --expose-gc, so probably node --expose-gc ./node_modules/.bin/jest ?

Does that fix it?

I am experiencing the same issue. This is my environment:

  • MacOS: 11.5.2
  • Node: v15.11.0
  • npm: v7.6.0
  • jest: ^27.0.6

Executing our tests with –runInBand will run all tests serially, and solves the problem, but it is just for debugging purposes.

Additional info for this parameter -> https://jestjs.io/docs/cli#--runinband

It seems to be related to the way it is creating a worker pool of child processes that will run the tests.

It would be great to see FORCE_EXIT_DELAY configurable as an option.

We are experiencing it in our integration tests, we have a lot of fakes that require teardowns, and the tests put the computer under a pretty hard load and sometimes they take over 500ms. If I --runInBand it’s OK, but to get it working with worker threads I need to exit every other process apart from the terminal and pray that no service starts indexing or something.

Increasing the FORCE_EXIT_DELAY value has reduced our test run time from ~80s to ~70s now that worker threads aren’t being killed.

Personally I fixed this for us by using patch-package which is good for this kind of thing, but it will need to be redone every time we update the jest-worker module.

I have also been experiencing the same issue:

  • Random failures with “A worker process has failed to exit” error
  • --detectOpenHandles doesn’t detect anything
  • --runInBand works fine

Changing the FORCE_EXIT_DELAY to something higher than 500 fixes the issue.

I am experiencing the same issue on macOS 11.6, Node 16.10 and Jest 27.2.5. As reported in this issue, increasing the FORCE_EXIT_DELAY or using --runInBand solves the problem for me. I would also prefer making FORCE_EXIT_DELAY configurable, as suggested by others in this thread.