vscode: "Search files by name" is slow when using remote

Does this issue occur when all extensions are disabled?: Yes/No

  • VS Code Version: 1.78.2
  • OS Version: MacOs Ventura 13.3.1

Steps to Reproduce:

  1. Use VSCode in remote mode and use “Search files by name” (with telemetry enabled)
  2. Compare the results within the same folder when not using remote

The bigger the repository, the worse it gets, and in subsequent calls, the gap narrows (I’m assuming this happens because of some in-memory cache being populated).

I’ve already followed Search Issues · microsoft/vscode Wiki to troubleshoot known cases, and tweaked the configuration to exclude as many files as I could, but in my codebase, it takes a really long time to perform searches.

Here is an example of me trying to use cmd+p and entering a query, waiting for it to resolve, and then trying the exact same query:

2023-05-25 17:34:42.321 [trace] telemetry/searchComplete {"properties":{"reason":"openFileHandler","scheme":"other","common.machineId":"66a593bf48622b3d8732d01f98fa2c27a72345606331167b14474051cdbccf7b","sessionID":"9d6c1d73-9a19-4e78-afb1-d7d8e95a90141685057185375","commitHash":"b3e4e68a0bc097f0ae7907b217c1119af9e03435","version":"1.78.2","common.platformVersion":"22.4.0","common.platform":"Mac","common.nodePlatform":"darwin","common.nodeArch":"arm64","common.product":"desktop","timestamp":"2023-05-25T23:34:42.320Z","common.version.shell":"22.5.2","common.version.renderer":"108.0.5359.215","common.firstSessionDate":"Mon, 12 Sep 2022 16:11:14 GMT","common.lastSessionDate":"Fri, 12 May 2023 17:29:33 GMT","common.isNewSession":"0","common.remoteAuthority":"ssh-remote","common.sandboxed":"0"},"measurements":{"resultCount":0,"workspaceFolderCount":1,"endToEndTime":8812,"sortingTime":-1,"fileWalkTime":8780,"directoriesWalked":0,"filesWalked":0,"cmdTime":8780,"cmdResultCount":368085,"common.timesincesessionstart":496945,"common.sequence":7,"common.cli":1}}
2023-05-25 17:35:20.492 [trace] telemetry/cachedSearchComplete {"properties":{"reason":"openFileHandler","scheme":"other","common.machineId":"66a593bf48622b3d8732d01f98fa2c27a72345606331167b14474051cdbccf7b","sessionID":"9d6c1d73-9a19-4e78-afb1-d7d8e95a90141685057185375","commitHash":"b3e4e68a0bc097f0ae7907b217c1119af9e03435","version":"1.78.2","common.platformVersion":"22.4.0","common.platform":"Mac","common.nodePlatform":"darwin","common.nodeArch":"arm64","common.product":"desktop","timestamp":"2023-05-25T23:35:20.491Z","common.version.shell":"22.5.2","common.version.renderer":"108.0.5359.215","common.firstSessionDate":"Mon, 12 Sep 2022 16:11:14 GMT","common.lastSessionDate":"Fri, 12 May 2023 17:29:33 GMT","common.isNewSession":"0","common.remoteAuthority":"ssh-remote","common.sandboxed":"0"},"measurements":{"resultCount":110872,"workspaceFolderCount":1,"endToEndTime":7734,"sortingTime":3375,"cacheWasResolved":1,"cacheLookupTime":0,"cacheFilterTime":125,"cacheEntryCount":368086,"common.timesincesessionstart":535116,"common.sequence":22,"common.cli":1}}

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 37
  • Comments: 59 (26 by maintainers)

Most upvoted comments

Just FYI, this issue is mentioned frequently internally as a sore spot with remote development, especially for power users. Some folks are hesitating to switch to remote development despite many other benefits because of it. It would be a huge deal if it were fixed.

@andreamah I have a quick question/hypothesis about this process:

We (as in, the public) know that ripgrep runs on the remote side, since we can see logs. And we can see that ripgrep itself runs really quickly in almost all the logs shared here, so that kinda rules out ripgrep being the issue.

Then, after ripgrep’s stdout is read and the file list is obtained, it’s fuzzy-matched. But does that fuzzy match happen on the remote machine or on the local machine?

I have to wonder if maybe the following happens:

  1. rg is run on remote host, getting a huge list of files.
  2. That entire list is transferred over the SSH connection.
  3. Then filtering happens on the local side?

That could explain the slowness since every process seems fast in isolation.

Or perhaps even if that particular sequence isn’t right, is there a way to inspect what’s happening over the SSH connection? Could easily rule out certain causes of this issue if so. Maybe a huge list of files is still sent even if the set of actual matches is smaller? Just kind of curious what the size/shape of the payload is and where the filtering happens.