cache: Cache restoration is orders of magnitude slower on Windows compared to Linux.
We are using this action to cache Conan packages.
Our current cached is ~300 MB on both Linux (ubuntu-20.04
) and Windows (windows-2019
).
Sadly, where the cache step routinely takes ~10 seconds on Linux, it oftentimes takes ~5 minutes on Windows.
This makes iterations frustrating, as the rest of the workflow takes about 2 minutes to complete, we get a global x3 time penalty because of this.
As far as I can tell, the archive retrieval time is comparable, it really is the cache un-archiving which seems to take very long on Windows.
Our latest run below for illustration purposes.
Linux:
Windows:
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 13
- Comments: 25 (4 by maintainers)
Commits related to this issue
- Run steps that must be run only once on ubuntu-latest Ubuntu is overall faster than macOS and Windows. Also this [recent issue][1] seems macOS specific. See also [Restoring cache on MacOS is extreme... — committed to serilog-contrib/serilog-formatting-log4net by 0xced 2 years ago
- Use gnu tar on windows CI See https://github.com/actions/cache/issues/752 — committed to watchexec/watchexec by passcod 2 years ago
- Add tar to Windows CI path per actions/cache#752 — committed to bufbuild/buf by jchadwick-buf 8 months ago
I just spent 3 days trying to find a workaround for this, so I hope this helps someone…
Background:
We use Bazel, and the local Bazel cache, although it is not necessarily big (~200 MB in our case), it contains a ton of small files (including symlinks). For us, it took ~19 Minutes to untar those 200 MB on Windows 😖
Failed attempts:
Based on this Windows issue (also linked above), my initial attempts were related to the Windows Defender:
➜ However, I found out that all relevant features are already disabled and the entire
D:/
drive (where the cache is extracted) is already excluded by default in the Github runners.➜ However, that requires a reboot. So, that’s only viable on a self-hosted runner.
BTW: The cache action already uses the
tar.exe
from Git Bash (C:\Program Files\Git\usr\bin\tar.exe
), so this workaround (suggested by @lvpx >1 year ago) makes no difference anymore.Our current workaround:
The underlaying issue is the large amount of files that need to be extracted, so let’s reduce the cache to a single file: ➜ Let’s put the cache in a Virtual Hard Disk!
So this is our solution:
I know… it’s long and ugly, but it works: Extracting the cache only takes 7 seconds and mounting the VHDX only takes 19 seconds! 🎉 This means that we reduced the cache restoration time by a factor of 44 🤓
This is based on Example 3 of the Mount-VHD docs and Example 5 of the New-VHD docs. I’m by no means proficient in powershell scripting, so there might be room for improvement…
A few details about the solution:
E:/
in the Github runners. However, this is not necessarily deterministic. I tried assigning a specific Drive Letter, but there are two issues with that: 1. the drive could be occupied and 2. theMount-VHD
command doesn’t support that (only theNew-Partition
command does).So, we store the path of the drive in a new
CACHE_DRIVE
environment variable that we can use in later steps.always()
or!cancelled()
in theDismount VHDX
step, because if something fails, the cache will be disregarded anyways. So, we don’t care if the volume gets dismounted or not 🤷@Safihre we are not part of GitHub anymore unfortunately. Hopefully someone will be pick these pending issues and respond.
@pdotl since #984 was narrowed down to cross-os (ref) should this issue be reopened?
Looks like this has been going on for awhile… See also #442 and #529. Hopefully @bishal-pdMSFT can make some improvements here? Maybe just provide an optional parameter to the action that would tell it to use
.zip
(or another format) instead of.tgz
on Windows? 7-Zip is pre-installed on the virtual environments.Thank you for your reponse!
(If the issue is with the specific un-archiving utility, maybe there is an alternative that the action could use on Windows to get better performances?)
+1 Attempting to cache on Windows proves to be a significant waste of time; I spent hours on it. At the very least, the official documentation should be updated to document this “known issue.”
@paco-sevilla - Thank you so much!!! I’ve been just experiencing this issue - where 25 mins to decompress the bazel cache (granted It’s probably caching more than needs to). Thank you! Thank you! Thank you!
Thanks @paco-sevilla. It’s just a bit crazy we have to resort to these kind of solutions instead of a proper solution from GitHub. This has been going on for ages. And it can’t just be the free users (like me) that experience this, also the corporate customers that actually pay for each Actions-minute must experience this and need reduction.
@bethanyj28 seems to be releasing latest version. They might help get this on someone’s radar.