cache: Cache restoration is orders of magnitude slower on Windows compared to Linux.

We are using this action to cache Conan packages.

Our current cached is ~300 MB on both Linux (ubuntu-20.04) and Windows (windows-2019). Sadly, where the cache step routinely takes ~10 seconds on Linux, it oftentimes takes ~5 minutes on Windows.

This makes iterations frustrating, as the rest of the workflow takes about 2 minutes to complete, we get a global x3 time penalty because of this.

As far as I can tell, the archive retrieval time is comparable, it really is the cache un-archiving which seems to take very long on Windows.


Our latest run below for illustration purposes.

Linux: image

Windows: image

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 13
  • Comments: 25 (4 by maintainers)

Commits related to this issue

Most upvoted comments

I just spent 3 days trying to find a workaround for this, so I hope this helps someone…

Background:

We use Bazel, and the local Bazel cache, although it is not necessarily big (~200 MB in our case), it contains a ton of small files (including symlinks). For us, it took ~19 Minutes to untar those 200 MB on Windows 😖

Failed attempts:

Based on this Windows issue (also linked above), my initial attempts were related to the Windows Defender:

  • Disable all features of the Windows Defender / Exclude the directory where the cache action extracts the archive (based on this gist and these docs)
    ➜ However, I found out that all relevant features are already disabled and the entire D:/ drive (where the cache is extracted) is already excluded by default in the Github runners.
  • Completely uninstalling Windows Defender (see this example)
    ➜ However, that requires a reboot. So, that’s only viable on a self-hosted runner.

BTW: The cache action already uses the tar.exe from Git Bash (C:\Program Files\Git\usr\bin\tar.exe), so this workaround (suggested by @lvpx >1 year ago) makes no difference anymore.

Our current workaround:

The underlaying issue is the large amount of files that need to be extracted, so let’s reduce the cache to a single file: ➜ Let’s put the cache in a Virtual Hard Disk!

So this is our solution:

runs-on: windows-2022
steps:
  ...

  - name: Cache Bazel (VHDX)
    uses: actions/cache@v3
    with:
      path: C:/bazel_cache.vhdx
      key: cache-windows

  - name: Create, mount and format VHDX
    run: |
      $Volume = `
        If (Test-Path C:/bazel_cache.vhdx) { `
            Mount-VHD -Path C:/bazel_cache.vhdx -PassThru | `
            Get-Disk | `
            Get-Partition | `
            Get-Volume `
        } else { `
            New-VHD -Path C:/bazel_cache.vhdx -SizeBytes 10GB | `
            Mount-VHD -Passthru | `
            Initialize-Disk -Passthru | `
            New-Partition -AssignDriveLetter -UseMaximumSize | `
            Format-Volume -FileSystem NTFS -Confirm:$false -Force `
        }; `
      Write-Output $Volume; `
      Write-Output "CACHE_DRIVE=$($Volume.DriveLetter)`:/" >> $env:GITHUB_ENV

  - name: Build and test
    run: bazelisk --output_base=$env:CACHE_DRIVE test --config=windows //...

  - name: Dismount VHDX
    run: Dismount-VHD -Path C:/bazel_cache.vhdx

I know… it’s long and ugly, but it works: Extracting the cache only takes 7 seconds and mounting the VHDX only takes 19 seconds! 🎉 This means that we reduced the cache restoration time by a factor of 44 🤓

This is based on Example 3 of the Mount-VHD docs and Example 5 of the New-VHD docs. I’m by no means proficient in powershell scripting, so there might be room for improvement…

A few details about the solution:

  • We reserve 10 GB for the VHDX, but that doesn’t mean that that’s the actual size of the file. The size of the VHDX is only slightly bigger than the size of the contents. But with 10 GB, we give Bazel enough space to work 😃
  • The VHDX is mounted in E:/ in the Github runners. However, this is not necessarily deterministic. I tried assigning a specific Drive Letter, but there are two issues with that: 1. the drive could be occupied and 2. the Mount-VHD command doesn’t support that (only the New-Partition command does).
    So, we store the path of the drive in a new CACHE_DRIVE environment variable that we can use in later steps.
  • We don’t use always() or !cancelled() in the Dismount VHDX step, because if something fails, the cache will be disregarded anyways. So, we don’t care if the volume gets dismounted or not 🤷

@Safihre we are not part of GitHub anymore unfortunately. Hopefully someone will be pick these pending issues and respond.

@pdotl since #984 was narrowed down to cross-os (ref) should this issue be reopened?

Looks like this has been going on for awhile… See also #442 and #529. Hopefully @bishal-pdMSFT can make some improvements here? Maybe just provide an optional parameter to the action that would tell it to use .zip (or another format) instead of .tgz on Windows? 7-Zip is pre-installed on the virtual environments.

Thank you for your reponse!

(If the issue is with the specific un-archiving utility, maybe there is an alternative that the action could use on Windows to get better performances?)

+1 Attempting to cache on Windows proves to be a significant waste of time; I spent hours on it. At the very least, the official documentation should be updated to document this “known issue.”

@paco-sevilla - Thank you so much!!! I’ve been just experiencing this issue - where 25 mins to decompress the bazel cache (granted It’s probably caching more than needs to). Thank you! Thank you! Thank you!

Thanks @paco-sevilla. It’s just a bit crazy we have to resort to these kind of solutions instead of a proper solution from GitHub. This has been going on for ages. And it can’t just be the free users (like me) that experience this, also the corporate customers that actually pay for each Actions-minute must experience this and need reduction.

@bethanyj28 seems to be releasing latest version. They might help get this on someone’s radar.