jest: cache write race condition across processes
Do you want to request a feature or report a bug? bug
What is the current behavior?
When running tests in parallel with the new atomic cache writing, we’re getting rename errors as multiple processes try to write to the same files. Even with --no-cache
option set it’s still hitting rename errors because it’s still trying to write to the files.
What is the expected behavior?
- I think that
--no-cache
should not write cache files - Caching across multiple processes should not collide, or should be able to restart the test.
Please provide your exact Jest configuration and mention your Jest, node, yarn/npm version and operating system.
{
"clearMocks": true,
"collectCoverageFrom": [
"packages/**/src/**/*.{ts,tsx}",
"!packages/sf-lint/**",
"!**/*.d.ts"
],
"coverageReporters": [
"text-summary"
],
"moduleFileExtensions": [
"ts",
"tsx",
"js",
"json"
],
"setupTestFrameworkScriptFile": "<rootDir>/jestEnvironment.js",
"transform": {
"\\.(ts|tsx)$": "<rootDir>/scripts/preprocessTypescript.js",
"\\.(less|css|svg|gif|png|jpg|jpeg)$": "<rootDir>/scripts/preprocessText.js"
},
"testRegex": "(Spec|.spec).tsx?$"
}
jest 21.0.1 node 6.9.2 yarn 0.27.x/1.0.0 OS Windows
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 51
- Comments: 122 (51 by maintainers)
Commits related to this issue
- chore: always run tests in-band Jest caching is currently broken on windows, causing tests to fail for no reason. See facebook/jest#4444 — committed to vuetifyjs/vuetify by KaelWD 7 years ago
- chore: always run tests in-band Jest caching is currently broken on windows, causing tests to fail for no reason. See facebook/jest#4444 (cherry picked from commit c43450512d45067ff301b9d3ba1055da8... — committed to vuetifyjs/vuetify by KaelWD 7 years ago
- Additional fix for #4444 to prevent errors on windows — committed to lukeapage/jest by lukeapage 3 years ago
- Additional fix for #4444 to prevent errors on windows (#11423) — committed to jestjs/jest by lukeapage 2 years ago
- fix: Workaround for Jest@Windows Issue Running Jest tests on windows may trigger some error like: EPERM: operation not permitted This related to issue: https://github.com/jestjs/jest/issues/444... — committed to CoreMedia/ckeditor-plugins by mmichaelis 8 months ago
just to chip in – I’m seeing this with jest 23.6 on a windows Jenkins CI server.
--runInBand
does work, but doubles testing time, so it’s not great, and since we have tests run before push, I can’t enable this without making my team members super-sadpackage.json
, as mentioned by @jsheetzati (https://github.com/facebook/jest/issues/4444#issuecomment-370533355) works, but it’s a bit of a hack.Since
graceful-fs
isn’t doing much about this (https://github.com/isaacs/node-graceful-fs/pull/131 hasn’t seen action since July last year), perhaps it’s time to fork? I’ve added a nag comment there, but I’m not expecting that to make anyone suddenly jump to sorting this out )':Any idea when this workaround will make it to a release?
@nyrkovalex What we do to avoid the issue you’re describing is use Jest’s cache directory option to make sure the cache is not shared across workspaces.
We do so by publishing a Jest preset that sets
cacheDirectory: '<rootDir>/.jest-cache'
and making sure that all packages use it. If doing so make sure to add.jest-cache
to.gitignore
.Before adding that option we would run into several issues as a result of having a global Jest cache shared across 16 executors per Jenkins agent. Using lockable resources will also prevent the issues as you mentioned but is wasteful as you are not using your Jenkins agent to its potential (as the Jest tests become a bottleneck).
this is the crux of it - and why the tool exists - the maintainers don’t care because it doesn’t affect them, even though it’s bad design (cache is to speed up execution and should not be a lynch-pin). Perhaps someone will listen if enough noise is made here, but I highly doubt it. I’ve seen a lot of the attitude that failures on “that os” don’t matter (because these failures are windows-specific) or “just use WSL”, and that’s what you’re going to find more of. Sorry to come down with the Hammer Of Cynicism, but this issue has been open for nearly 5 years and the fix is trivial - if the maintainers cared, it would be implemented already.
I don’t need to provide a PR because the fix is so simple that a tool like mine could do line-based updates to work around it.
I’m having the same issue but the error message is different
Failure message: EBADF: bad file descriptor, close
It seems that running jest with --runInBand does not solve the problem the first time, but only after another run it executes without errors.
Running on Windows 10 Enterprise VM as part of a TFS Build.
Having the same issue and can’t find a way around it. Jest is basically unusable for us like this.
Can multiple people confirm that using https://github.com/isaacs/node-graceful-fs/pull/119 fixes their issues?
You can use the fork by using yarn resolutions, see https://yarnpkg.com/en/docs/selective-version-resolutions, which should allow you to deploy the fix to CI etc.
e.g.
We are trying to update to 21.2.0 from 20.0.4 and now we have the following error on our build servers:
fwiw, for anyone reading this, I’ve made a little cli tool to patch ScriptTransformer.js in the @jest/transform folder, operating on locally-installed modules; so far, it’s running as part of my CI pipeline after 5 successive failures the other day for cache read / write errors. A build which fails due to this error costs my QA 10 minutes per shot, so successive fails like this are an hour wasted in the day.
Check it out here: https://www.npmjs.com/package/patch-jest-cache
I haven’t seen the (default) warnings that it should emit on an error, so, so far, I’m “lucky” enough not to be hitting the issue after my 5-in-a-row, but the good news is that my tests run and pass, so at least I know that the “happy path” works. The code is fairly well tested too.
If you’d like to use it:
if you (like our one project) have a base dir with
node_modules
and asrc
dir, this tool will run a lot quicker if you tell it where to run (as it has to findScriptTransformer.js
- usenpx patch-jest-cache --in src
. It should be run (or instructed to run) from within the folder containingnode_modules
.it’s safe to run multiple times in succession against the same installed module and can be undone by simply issuing
npm ci
/npm install
(whichever you prefer)patch-jest-cache
will break if it finds multipleScriptTransformer.js
files (eg monorepo) - that’s me being overly cautious, in case there really are repeated files at some point in the future.patch-jest-cache
can also be used to properly disable caching, as I’ve found that the command-line argument--no-cache
still makes jest waste IO time whilst writing out cache files that it won’t use.As with most open-source, free code, YMMV, I accept no liability for any damage, if it breaks, you can keep all the pieces. The patching process is inspired by @lukeapage (thanks!).
If you’re wondering “why bother?” - that’s a good question. As far as I can tell, this only affects windows users (though there are reasons caching could fail on !windows) - but that’s my test / build target. Since this has been an open issue for nearly 4 years, I don’t have a lot of confidence that it will get fixed soon - though perhaps the mere presence of a horrid hack like my patcher may change that.
Apologies, seems like the fix has not landed in graceful-fs yet 😦
We have had pretty good luck so far just adding the patched version of
graceful-fs
to our package.json. Works for us with npm and yarn.A patch I tried locally that seemed to work is ignoring the error if it comes from trying to rename to a file with the same content. Since it just means another process ‘won’ the race, we can ignore it and move on.
@samboylett two prs are open but I’ve not got the maintainers to engage with them. The prs are linked further up in the thread.
This started again for us in the last couple of months - windows - very intermittant.
write-file-atomic no longer uses graceful-fs - maybe that has something to do with it?
Also running into this issue, any insights into a better/permanent fix?
2.4.2 to 2.3.0
We could probably fork and publish with that fix
@cpojer could you provide some more info why it’s closed? is there a fix provided? We still have this issue
@SimenB, sure, I’ll file an issue.
@cpojer, can this error be swallowed/ignored and treated as a warning? It implies that the file has already been written and no data should be lost.
The problem is half fixed in master - it is missing https://github.com/facebook/jest/pull/11104
The half fix that is merged is https://github.com/facebook/jest/pull/11423
@chekrd, I have a beta version if you’d like to try
there was significant re-work to go from a singular process to batching all of them, so, whilst all the tests pass, and it’s working in my work repo, I’d like confirmation from others before simply unleashing this as it seems as if a few hundred people are using this regularly. I don’t want to cause anyone undue hassles 😅
hm, that’s interesting, because the function should be made tolerant by default; disabling write cache does what it says on the tin - disables any writing to the cache whatsoever. I’ll try have a look at this Sometime Real Soon Now (:
Same here - we use it for all installs - a hundred developers and hundreds of ci builds a day.
activity
Actually we patched write file atomic for the write issue as I really didn’t want to maintain a fork of jest transform. If you want you can point at the git commits of the pull requests above, but they are off master so I have no idea if they will work with a specific jest version.
If I don’t get any response here I will give it a go to maintain a fork just of jest transform against each version of jest I use and then use yarn to rewire to my forked version.
I think I have a better handle on whats going wrong - we don’t see two renames colliding and nothing happening in one. We see something like this.
process 1 - rename file process 2 - rename file process 2 - error process 2 - patch logging - check if file exists - it does exist process 1 - finished ok process 3 - rename file process 2 - check if file exists - it does not exist because process 3 is overwriting it?!
This happens on fast build machines with 15 processes running - when I log for rename collisions, we get ~55k collisions, so its not surprising that 1 in 100k hits an error like the above.
Its made worse by us evergreening dependencies on a daily basis as that clears the cache.
After some thought I think the best option for us is to ignore errors, rather than try again.
Can anyone from the core team given this analysis tell me what patch you would accept?
@iarna seems like som regression was introduced.
I’m not sure what it is but it has been a few weeks possibly a couple of months and I haven’t observed the failure anymore. We have been using jest 22.4.2 for a while and upgraded to 22.4.4 recently. We’ve also updated various other packages.
As a workaround to fiddling with graceful-fs, couldn’t you simply give each worker process / thread it’s own cache directory to avoid the race condition?
It’s probably slow, but we have to use --runInBand on our CI server and that is way worse.
If someone can point me at the right files to look at, I might even try to write a patch. I have a really hard time navigating the source of jest.
that would be awesome…
After scrolling through this thread, I found a resolution using
yarn
. Is there a resolution usingnpm
instead?I’ve checked the
graceful-fs
override with the latest version of Jest and unfortunately it no longer seems to work as reliably since I last tested it. There’s still non-zero chance that it runs into a race condition on large test suites.Upstream issue: npm/write-file-atomic#28
New flag? It’s a highly misleading name. And on e.g. CI you rarely want the cache anyways, so it’s just wasted resources. Or is a cache generated within a single test run used during
--no-cache
, and it only ignores existent caches?Awesome! Could you open up an issue against write-file-atomic? It feels like a fix should go there, and if not (they don’t want to support multiple processes writing at once) we can revisit on our side. WDYT?
--no-cache
is more like--reset-cache
actually. It means it won’t use the existing cache, but it will still write cache. I’d like to retain that.require
calls in user code, so we can’t change that.This problem is very similar to https://github.com/npm/write-file-atomic/issues/10 and https://github.com/npm/write-file-atomic/pull/22.
We basically run into the same issue with our tests. One easy way to reproduce was to remove jest
cacheDirectory
to force cache generation on the next run.