checkout: Checkout suddenly much slower on windows environment
Moved from https://github.com/actions/runner-images/issues/7180
Description
Since a few days, the duration of the actions/checkout@v3
steps on windows-2019
have dramatically increased.
This behavior is seen on all my repos (all private). Below a table showing an example of before/after.
Checkout step | Feb 17th | Feb 21st |
---|---|---|
1 | 13s |
1m35s |
2 | 8s |
47s |
The result is a huge increase in build (and billable) time.
The github status page does show some issues around this time frame, but these were all solved:
Platforms affected
- Azure DevOps
- GitHub Actions - Standard Runners
- GitHub Actions - Larger Runners
Runner images affected
- Ubuntu 18.04
- Ubuntu 20.04
- Ubuntu 22.04
- macOS 11
- macOS 12
- Windows Server 2019
- Windows Server 2022
Image version and build link
Current runner version: '2.301.1'
Operating System
Microsoft Windows Server 2019
10.0.17763
Private repo
Is it regression?
Yes, sorry private repos
Expected behavior
The build times should be fairly constant.
Actual behavior
Build times explode. Burning down our build minutes too fast.
Repro steps
Compare build times on any windows environment from before Feb 18th with today.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 10
- Comments: 30 (9 by maintainers)
Links to this issue
Commits related to this issue
- Improve checkout performance on Windows runners by upgrading @actions/github dependency Re: https://github.com/actions/checkout/issues/1186 @dscho discovered that the checkout action could stall for... — committed to BrettDong/checkout by BrettDong a year ago
- Improve checkout performance on Windows runners by upgrading @actions/github dependency Re: https://github.com/actions/checkout/issues/1186 @dscho discovered that the checkout action could stall for... — committed to BrettDong/checkout by BrettDong a year ago
- Improve checkout performance on Windows runners by upgrading @actions/github dependency Re: https://github.com/actions/checkout/issues/1186 @dscho discovered that the checkout action could stall for... — committed to BrettDong/checkout by BrettDong a year ago
- Improve checkout performance on Windows runners by upgrading @actions/github dependency (#1246) * Improve checkout performance on Windows runners by upgrading @actions/github dependency Re: https:... — committed to actions/checkout by BrettDong a year ago
- Use GCS for Windows ccache (#13183) We have found the GitHub actions built-in caching mechanism to be extremely limiting: slow, small, and buggy. Switch instead to using our own remote ccache hoste... — committed to iree-org/iree by GMNGeoffrey a year ago
- Improve checkout performance on Windows runners by upgrading @actions/github dependency (#1246) * Improve checkout performance on Windows runners by upgrading @actions/github dependency Re: https:... — committed to DymysClaflin3/checkout by BrettDong a year ago
- Use GCS for Windows ccache (#13183) We have found the GitHub actions built-in caching mechanism to be extremely limiting: slow, small, and buggy. Switch instead to using our own remote ccache hoste... — committed to iree-org/iree by GMNGeoffrey a year ago
- ci: switch to windows-2019 Related: https://github.com/actions/checkout/issues/1186 — committed to crytic/medusa by elopez a year ago
- ci: switch to windows-2019 Related: https://github.com/actions/checkout/issues/1186 — committed to crytic/medusa by elopez a year ago
- Use GCS for Windows ccache (#13183) We have found the GitHub actions built-in caching mechanism to be extremely limiting: slow, small, and buggy. Switch instead to using our own remote ccache hoste... — committed to NatashaKnk/iree by GMNGeoffrey a year ago
Yeah, but presumably so can the GitHub engineers who support says are working to fix this. Like, IDK, it kind of seems to me that the people who wrote this code, control these VMs, and whom we are paying for this service, could maybe take a look at the issues with it.
I’ve encountered this and debugged it a bit, by heavily instrumenting the transpiled code in
dist/index.js
. Turns out that there are multiple calls to PowerShell that make a mess by essentially waiting for dozens of seconds. Here is the example run I analyzed:all on this PowerShell call:
totaling 1 minute and 48 whopping seconds just to determine a Windows release that the
actions/checkout
Action has no need to know.To make things even worse, the same PowerShell invocation also happens multiple times during the post-action:
I do not really understand what changed between Feb 17th and 21st that would explain this slowdown. There has not been a new
actions/checkout
release since Jan 5th…We are discontinuing our use of GH managed windows runners. The costs were already beyond premium/sustainable, and the performance is so poor that the issue compounds out of control. I don’t consider this a viable way to run CI for any business.
I can tolerate a lot but not at massively inflated prices.
Just a reminder, given the evidence in https://github.com/actions/runner-images/issues/7320, there is almost certainly an underlying issue that is not specific to
git
at all. Rather, all operations that place high demands on disk (and CPU).The fix in #1246 reduces stalled time down to 3 seconds.
During the 3 seconds the workflow is stalled on loading the
node.exe
from disk to memory for execution. I don’t think there is anything I can do to get rid of it.I have observed this too. Example job: https://github.com/Tyrrrz/CliWrap/actions/runs/4271628026
Note that this behavior is pretty inconsistent, and usually
checkout
completes pretty fast on Windows. But in the event that it is slow, it’s only ever slow on Windows.Thanks Ben. Further runs suggest that my switch to use git commands directly instead of actions/checkout was just lucky the first few times (or the computer learned what I was trying to do and put a stop to it 😛). Subsequent runs have had similar latency to before the switch, I think (I started hacking together a script to collect statistics for jobs over time, but got side-tracked, so pure anecdata right now). So I’m back to thinking it’s the VM+git itself and not the action. I am sort of considering getting tarballs for all of the submodules instead of using git… I’ll update if that seems to be faster somehow (which would suggest to me something git-specific and not just IO or network issues)
👋 hey sorry for not seeing this one, we are tracking/providing commentary on our investigation on the windows larger runner checkout here: oops thats the internal issue and no good to anyoneOk that first update was all kinds of wrong let me try again!
Sorry we haven’t commented on this ticket still, we are tracking this internally. We are have made some changes to the Windows VM image but only recently and the don’t appear to have helped. With everything else going on we have had to put this one aside as well for the last couple of weeks but we are committed to fixing this. I will re-open this ticket as this is linked in the issue we are tracking internally 😃
@GMNGeoffrey I would like to encourage you to consider the current macro-economical climate, and also just how large he roadmap is. And please also note how @BrettDong’s effort was rewarded: I am sure that you also will get what you want much quicker if you dig a little deeper with those ETW traces. I would even consider to help, but I do not have access to those large runners; You do, though.
You can collect ETW traces to help diagnose what’s happening and taking time during checkout action.
@GMNGeoffrey, I have done extensive testing and confirmed that my issues were resolved by actions/checkout version 3.5.1.
https://github.com/actions/checkout/pull/1246#issuecomment-1499276093
try: https://github.com/actions/checkout/issues/1186#issuecomment-1484896561
@BrettDong excellent! I tested this and the times are back to decent levels: the entire
checkout
step takes 11 seconds with your PR. Thank you so much!I have briefly seen success trying to hard-code the fix of https://github.com/sindresorhus/windows-release/pull/18 into
dist/index.js
but it does not seem to work when I undelete all the core logic. sighAt this point, I have to tend to other things, but I thought I’d leave my current findings here in case somebody else can take over (and try things like overriding
os.release()
to always return 2022 or something like that, before letting@octokit/endpoint
initialize its User-Agent string).