jest: cache write race condition across processes

Do you want to request a feature or report a bug? bug

What is the current behavior? When running tests in parallel with the new atomic cache writing, we’re getting rename errors as multiple processes try to write to the same files. Even with --no-cache option set it’s still hitting rename errors because it’s still trying to write to the files.

What is the expected behavior?

  1. I think that --no-cache should not write cache files
  2. Caching across multiple processes should not collide, or should be able to restart the test.

Please provide your exact Jest configuration and mention your Jest, node, yarn/npm version and operating system.

{
	"clearMocks": true,
	"collectCoverageFrom": [
		"packages/**/src/**/*.{ts,tsx}",
		"!packages/sf-lint/**",
		"!**/*.d.ts"
	],
	"coverageReporters": [
		"text-summary"
	],
	"moduleFileExtensions": [
		"ts",
		"tsx",
		"js",
		"json"
	],
	"setupTestFrameworkScriptFile": "<rootDir>/jestEnvironment.js",
	"transform": {
		"\\.(ts|tsx)$": "<rootDir>/scripts/preprocessTypescript.js",
		"\\.(less|css|svg|gif|png|jpg|jpeg)$": "<rootDir>/scripts/preprocessText.js"
	},
	"testRegex": "(Spec|.spec).tsx?$"
}

jest 21.0.1 node 6.9.2 yarn 0.27.x/1.0.0 OS Windows

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 51
  • Comments: 122 (51 by maintainers)

Commits related to this issue

Most upvoted comments

just to chip in – I’m seeing this with jest 23.6 on a windows Jenkins CI server.

Since graceful-fs isn’t doing much about this (https://github.com/isaacs/node-graceful-fs/pull/131 hasn’t seen action since July last year), perhaps it’s time to fork? I’ve added a nag comment there, but I’m not expecting that to make anyone suddenly jump to sorting this out )':

Any idea when this workaround will make it to a release?

@nyrkovalex What we do to avoid the issue you’re describing is use Jest’s cache directory option to make sure the cache is not shared across workspaces.

We do so by publishing a Jest preset that sets cacheDirectory: '<rootDir>/.jest-cache' and making sure that all packages use it. If doing so make sure to add .jest-cache to .gitignore.

Before adding that option we would run into several issues as a result of having a global Jest cache shared across 16 executors per Jenkins agent. Using lockable resources will also prevent the issues as you mentioned but is wasteful as you are not using your Jenkins agent to its potential (as the Jest tests become a bottleneck).

this is the crux of it - and why the tool exists - the maintainers don’t care because it doesn’t affect them, even though it’s bad design (cache is to speed up execution and should not be a lynch-pin). Perhaps someone will listen if enough noise is made here, but I highly doubt it. I’ve seen a lot of the attitude that failures on “that os” don’t matter (because these failures are windows-specific) or “just use WSL”, and that’s what you’re going to find more of. Sorry to come down with the Hammer Of Cynicism, but this issue has been open for nearly 5 years and the fix is trivial - if the maintainers cared, it would be implemented already.

I don’t need to provide a PR because the fix is so simple that a tool like mine could do line-based updates to work around it.

I’m having the same issue but the error message is different Failure message: EBADF: bad file descriptor, close

jest: failed to cache transform results in: C:/agent/_work/_temp/jest/jest-transform-cache-2a12bf9796741cb06fb97a276aa09712-7d718cda2d798ae78eb741b3feff799e/7b/test-setup_7bdb1937d8dbbf5088142bc21e74a530
2019-01-24T13:44:55.5496091Z     Failure message: EBADF: bad file descriptor, close

It seems that running jest with --runInBand does not solve the problem the first time, but only after another run it executes without errors.

Running on Windows 10 Enterprise VM as part of a TFS Build.

Having the same issue and can’t find a way around it. Jest is basically unusable for us like this.

Can multiple people confirm that using https://github.com/isaacs/node-graceful-fs/pull/119 fixes their issues?

You can use the fork by using yarn resolutions, see https://yarnpkg.com/en/docs/selective-version-resolutions, which should allow you to deploy the fix to CI etc.

e.g.

{
  "resolutions": {
    "graceful-fs": "mekwall/node-graceful-fs#a10aa576f771d7cf3dfaee523f2e02d6eb11a89f"
  }
}

We are trying to update to 21.2.0 from 20.0.4 and now we have the following error on our build servers:

Test suite failed to run
[13:46:50]
[13:46:50]    jest: failed to cache transform results in: C:/TeamCity/buildAgent/temp/buildTmp/jest/jest-transform-cache-c60bb8ad55f1dbc29115038800657f2f-4895fc34da3cd9ad1e120af622bca745/3b/fe-selectors_3b56db772e798e2e4d0c9fc7c9e4a770
[13:46:50]    Failure message: EPERM: operation not permitted, rename '...\jest\jest-transform-cache-c60bb8ad55f1dbc29115038800657f2f-4895fc34da3cd9ad1e120af622bca745\3b\fe-selectors_3b56db772e798e2e4d0c9fc7c9e4a770.1701848979' -> '...\jest\jest-transform-cache-c60bb8ad55f1dbc29115038800657f2f-4895fc34da3cd9ad1e120af622bca745\3b\fe-selectors_3b56db772e798e2e4d0c9fc7c9e4a770'
[13:46:50]      
[13:46:50]      at Object.fs.renameSync (fs.js:772:18)
[13:46:50]      at Function.writeFileSync [as sync] (node_modules/write-file-atomic/index.js:192:8)

fwiw, for anyone reading this, I’ve made a little cli tool to patch ScriptTransformer.js in the @jest/transform folder, operating on locally-installed modules; so far, it’s running as part of my CI pipeline after 5 successive failures the other day for cache read / write errors. A build which fails due to this error costs my QA 10 minutes per shot, so successive fails like this are an hour wasted in the day.

Check it out here: https://www.npmjs.com/package/patch-jest-cache

I haven’t seen the (default) warnings that it should emit on an error, so, so far, I’m “lucky” enough not to be hitting the issue after my 5-in-a-row, but the good news is that my tests run and pass, so at least I know that the “happy path” works. The code is fairly well tested too.

If you’d like to use it:

npx patch-jest-cache

if you (like our one project) have a base dir with node_modules and a src dir, this tool will run a lot quicker if you tell it where to run (as it has to find ScriptTransformer.js - use npx patch-jest-cache --in src. It should be run (or instructed to run) from within the folder containing node_modules.

it’s safe to run multiple times in succession against the same installed module and can be undone by simply issuing npm ci / npm install (whichever you prefer)

patch-jest-cache will break if it finds multiple ScriptTransformer.js files (eg monorepo) - that’s me being overly cautious, in case there really are repeated files at some point in the future.

patch-jest-cache can also be used to properly disable caching, as I’ve found that the command-line argument --no-cache still makes jest waste IO time whilst writing out cache files that it won’t use.

As with most open-source, free code, YMMV, I accept no liability for any damage, if it breaks, you can keep all the pieces. The patching process is inspired by @lukeapage (thanks!).

If you’re wondering “why bother?” - that’s a good question. As far as I can tell, this only affects windows users (though there are reasons caching could fail on !windows) - but that’s my test / build target. Since this has been an open issue for nearly 4 years, I don’t have a lot of confidence that it will get fixed soon - though perhaps the mere presence of a horrid hack like my patcher may change that.

Apologies, seems like the fix has not landed in graceful-fs yet 😦

We have had pretty good luck so far just adding the patched version of graceful-fs to our package.json. Works for us with npm and yarn.

"graceful-fs": "https://github.com/mekwall/node-graceful-fs.git#patch-1",

A patch I tried locally that seemed to work is ignoring the error if it comes from trying to rename to a file with the same content. Since it just means another process ‘won’ the race, we can ignore it and move on.

const cacheWriteErrorSafeToIgnore = (
  e: Error,
  cachePath: Path,
  fileData: string,
) => {
  if (
    e.message &&
    e.message.indexOf('EPERM: operation not permitted, rename') > -1
  ) {
    try {
      const currentContent = fs.readFileSync(cachePath, 'utf8');
      return fileData === currentContent;
    } catch (e) {
    }
  }
  return false;
};

Nothing is more permanent than a temporary solution.

@samboylett two prs are open but I’ve not got the maintainers to engage with them. The prs are linked further up in the thread.

This started again for us in the last couple of months - windows - very intermittant.

write-file-atomic no longer uses graceful-fs - maybe that has something to do with it?

Also running into this issue, any insights into a better/permanent fix?

@moizgh from what version to what version?

2.4.2 to 2.3.0

We could probably fork and publish with that fix

@cpojer could you provide some more info why it’s closed? is there a fix provided? We still have this issue

@SimenB, sure, I’ll file an issue.

@cpojer, can this error be swallowed/ignored and treated as a warning? It implies that the file has already been written and no data should be lost.

The problem is half fixed in master - it is missing https://github.com/facebook/jest/pull/11104

The half fix that is merged is https://github.com/facebook/jest/pull/11423

@chekrd, I have a beta version if you’d like to try

npx -y patch-jest-cache@beta

there was significant re-work to go from a singular process to batching all of them, so, whilst all the tests pass, and it’s working in my work repo, I’d like confirmation from others before simply unleashing this as it seems as if a few hundred people are using this regularly. I don’t want to cause anyone undue hassles 😅

hm, that’s interesting, because the function should be made tolerant by default; disabling write cache does what it says on the tin - disables any writing to the cache whatsoever. I’ll try have a look at this Sometime Real Soon Now (:

Same here - we use it for all installs - a hundred developers and hundreds of ci builds a day.

activity

Actually we patched write file atomic for the write issue as I really didn’t want to maintain a fork of jest transform. If you want you can point at the git commits of the pull requests above, but they are off master so I have no idea if they will work with a specific jest version.

If I don’t get any response here I will give it a go to maintain a fork just of jest transform against each version of jest I use and then use yarn to rewire to my forked version.

I think I have a better handle on whats going wrong - we don’t see two renames colliding and nothing happening in one. We see something like this.

process 1 - rename file process 2 - rename file process 2 - error process 2 - patch logging - check if file exists - it does exist process 1 - finished ok process 3 - rename file process 2 - check if file exists - it does not exist because process 3 is overwriting it?!

This happens on fast build machines with 15 processes running - when I log for rename collisions, we get ~55k collisions, so its not surprising that 1 in 100k hits an error like the above.

Its made worse by us evergreening dependencies on a daily basis as that clears the cache.

After some thought I think the best option for us is to ignore errors, rather than try again.

Can anyone from the core team given this analysis tell me what patch you would accept?

@iarna seems like som regression was introduced.

I’m not sure what it is but it has been a few weeks possibly a couple of months and I haven’t observed the failure anymore. We have been using jest 22.4.2 for a while and upgraded to 22.4.4 recently. We’ve also updated various other packages.

As a workaround to fiddling with graceful-fs, couldn’t you simply give each worker process / thread it’s own cache directory to avoid the race condition?

It’s probably slow, but we have to use --runInBand on our CI server and that is way worse.

If someone can point me at the right files to look at, I might even try to write a patch. I have a really hard time navigating the source of jest.

that would be awesome…

After scrolling through this thread, I found a resolution using yarn. Is there a resolution using npm instead?

I’ve checked the graceful-fs override with the latest version of Jest and unfortunately it no longer seems to work as reliably since I last tested it. There’s still non-zero chance that it runs into a race condition on large test suites.

Upstream issue: npm/write-file-atomic#28

I’d like to retain that.

New flag? It’s a highly misleading name. And on e.g. CI you rarely want the cache anyways, so it’s just wasted resources. Or is a cache generated within a single test run used during --no-cache, and it only ignores existent caches?

Here’s the other repro with worker-farm and write-file-atomic

Awesome! Could you open up an issue against write-file-atomic? It feels like a fix should go there, and if not (they don’t want to support multiple processes writing at once) we can revisit on our side. WDYT?

  • --no-cache is more like --reset-cache actually. It means it won’t use the existing cache, but it will still write cache. I’d like to retain that.
  • These operations have to be sync, because they happen during require calls in user code, so we can’t change that.

We basically run into the same issue with our tests. One easy way to reproduce was to remove jest cacheDirectory to force cache generation on the next run.


    jest: failed to cache transform results in: 
C:/myniceproject/src/jest-cache/jest-transform-cache-b2e8f1f700b9bd266a0d27bb01b47a2b-34a7e3d71d38ff01f65fdb5abdf5126b/3f/settingsProvider_3f1439e55275a95ecfdb7dcb432f7958
   Failure message: EPERM: operation not permitted, rename 
'C:\myniceproject\src\jest-cache\jest-transform-cache-b2e8f1f700b9bd266a0d27bb01b47a2b-34a7e3d71d38ff01f65fdb5abdf5126b\3f\settingsProvider_3f1439e55275a95ecfdb7dcb432f7958.1630729137' 
-> 
'C:\myniceproject\src\jest-cache\jest-transform-cache-b2e8f1f700b9bd266a0d27bb01b47a2b-34a7e3d71d38ff01f65fdb5abdf5126b\3f\settingsProvider_3f1439e55275a95ecfdb7dcb432f7958'`