nx: Nx is very slow in a large monorepo (>500k files) due to globbing of `**/project.json`

Current Behavior

Reopening https://github.com/nrwl/nx/issues/9660

I’m trying to integrate nx in an existing large monorepo (>500k files), but it only has ~25 workspaces. When nx initializes, it tries to get the list of workspaces by globbing both **/project.json and all the workspaces items from package.json:

https://github.com/nrwl/nx/blob/61d7d74378305ba96f9e94707aefeecb438328ca/packages/nx/src/config/workspaces.ts#L703

This **/project.json forces the glob to crawl every file, and that takes 5+s in my case, just the init phase of every command.

Dropping **/project.json from the glob makes the glob operation instantaneous. A fix would be to have something similar to getGlobPatternsFromPackageManagerWorkspaces that only reads package path entries from package.json workspaces instead, and appends project.json.

https://github.com/nrwl/nx/blob/61d7d74378305ba96f9e94707aefeecb438328ca/packages/nx/src/config/workspaces.ts#L687

Expected Behavior

To have a faster nx boot time.

Github Repo

No response

Steps to Reproduce

Any repo with hundreds of thousands of files should show the slowness.

Nx Report

>  NX   Report complete - copy this into the issue template

   Node : 16.18.0
   OS   : darwin arm64
   yarn : 3.2.4
   
   nx : 15.2.4
   @nrwl/angular : Not Found
   @nrwl/cypress : Not Found
   @nrwl/detox : Not Found
   @nrwl/devkit : 14.0.0
   @nrwl/esbuild : Not Found
   @nrwl/eslint-plugin-nx : Not Found
   @nrwl/expo : Not Found
   @nrwl/express : Not Found
   @nrwl/jest : 14.0.0
   @nrwl/js : Not Found
   @nrwl/linter : 14.0.0
   @nrwl/nest : Not Found
   @nrwl/next : Not Found
   @nrwl/node : Not Found
   @nrwl/nx-cloud : Not Found
   @nrwl/nx-plugin : Not Found
   @nrwl/react : Not Found
   @nrwl/react-native : Not Found
   @nrwl/rollup : Not Found
   @nrwl/schematics : Not Found
   @nrwl/storybook : Not Found
   @nrwl/web : Not Found
   @nrwl/webpack : Not Found
   @nrwl/workspace : 14.0.0
   typescript : 4.8.4
   ---------------------------------------
   Local workspace plugins:
   ---------------------------------------
   Community plugins:

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 6
  • Comments: 21 (2 by maintainers)

Most upvoted comments

I’m working on setting up a JS monorepo within a very large Ruby monolith, and running into this same issue. The **/project.json causes all of the directories to be crawled, while we only care about a small subset. I’ve tried leveraging .nxignore to speed it up, but it doesn’t seem to help. I think the problem lies in this section of code: https://github.com/nrwl/nx/blob/73bc2e1c915fac40e29db930297121646362733b/packages/nx/src/config/workspaces.ts#L630-L646

Note that the globSync call is made without taking .nxignore into consideration. The ig value is passed into deduplicateProjectFiles, but not until after we have already globed the file system without considering .nxignore. Would it be possible to parse the .nxignore and add the values to ALWAYS_IGNORE before calling globSync? Something like this:

  try {
    const nxIgnoreContent = readFileSync(`${root}/.nxignore`, 'utf-8')
    nxIgnoreContent.split('\n').forEach((rawLine) => {
      const line = rawLine.trim()
      if (line) {
        ALWAYS_IGNORE.push(line)
      }
    })
    ig.add(nxIgnoreContent);
  } catch {}

@vdumitraskovic I created this patch (yarn patch or patch-package):

diff --git a/src/config/workspaces.js b/src/config/workspaces.js
index 5dad79d352ed3ffa6dc4a3f6cda0656fad099262..2473944ed2c7a3c1c9405f60dad8821af735cfb9 100644
--- a/src/config/workspaces.js
+++ b/src/config/workspaces.js
@@ -464,9 +464,14 @@ function getGlobPatternsFromPackageManagerWorkspaces(root) {
 }
 exports.getGlobPatternsFromPackageManagerWorkspaces = getGlobPatternsFromPackageManagerWorkspaces;
 function normalizePatterns(patterns) {
-    return patterns.map((pattern) => removeRelativePath(pattern.endsWith('/package.json')
+    // Return both package.json and project.json per pattern.
+    // See https://github.com/nrwl/nx/issues/13637
+    const paths = patterns.map((pattern) => removeRelativePath(pattern.endsWith('/package.json')
         ? pattern
         : (0, path_2.joinPathFragments)(pattern, 'package.json')));
+
+
+    return paths.flatMap(pattern => [pattern, pattern.replace(/package\.json$/, 'project.json')]);
 }
 function removeRelativePath(pattern) {
     return pattern.startsWith('./') ? pattern.substring(2) : pattern;
@@ -489,7 +494,9 @@ function globForProjectFiles(root, nxJson, ignorePluginInference = false) {
         .map((glob) => (glob.startsWith('/') ? glob.substring(1) : glob));
     const projectGlobPatterns = [
         'project.json',
-        '**/project.json',
+        // '**' glob causes a major slowdown. The normalizePatterns function is patched to bring back project.json
+        // See https://github.com/nrwl/nx/issues/13637
+        // '**/project.json',
         ...globsToInclude,
     ];
     if (!ignorePluginInference) {

This is still an issue.

Hey, sorry for the lack of answers on this one. Its certainly not stale, and the globbing performance is something that we are aware of. Generally tuning .nxignore can help out quite a bit, but there are some similar issues that would probably shake out to work for this one too.

For instance, #13843 would be solved by a similar solution.

We don’t want to adopt the package.json workspaces field in particular because that ties Nx down to JS and requiring the root package.json. Additionally, the current setup allows you to have some projects managed by npm workspaces and some only by Nx. I could see a world where we add a similar field into nx.json, but I’m not sure exactly what that would look like. I’ll try to get with @FrozenPandaz and see what his thinking is, probably need some input from @vsavkin on this one too.