sccache: GitHub Actions cache is surprisingly slow
I was excited to see the introduction of the GitHub Actions cache backend. However, when I tried it on a test branch, the results were surprisingly slow.
See the initial run at https://github.com/indygreg/apple-platform-rs/actions/runs/3723171428/jobs/6314513850. Here, a build job took ~1 hour. (It normally takes ~10m.) The Stop sccache
step displays sccache server stats. This particular job shows 44 cache errors
. Unsure if that is possibly the cause of problems?
I did another CI run where I cranked up logging to debug and printed the log file after the build. See e.g. https://github.com/indygreg/apple-platform-rs/actions/runs/3723354206/jobs/6314814370 for a slow build. Interestingly, some of the jobs in this run were fast! See https://github.com/indygreg/apple-platform-rs/actions/runs/3723354206/jobs/6314814955 for a fast one. Spot checking the logs, I think there is a positive correlation between the cache errors
count and job time.
Poking around the logs, I see occurrences of DEBUG sccache::compiler::compiler] [aws_http]: Cache write error: Cache service responded with 429 Too Many Requests: {"$id":"1","innerException":null,"message":"Request was blocked due to exceeding usage of resource 'Count' in namespace ''.","typeName":"Microsoft.TeamFoundation.Framework.Server.RequestBlockedException, Microsoft.TeamFoundation.Framework.Server","typeKey":"RequestBlockedException","errorCode":0,"eventId":3000}
. So I guess my project is too big to use GitHub Actions cache???
More worrisome though is that some of the cache writes take 10+s. e.g.
2022-12-18T05:13:25.3938134Z [2022-12-18T05:08:46Z DEBUG hyper::client::pool] reuse idle connection for ("https", artifactcache.actions.githubusercontent.com)
2022-12-18T05:13:25.3938390Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::io] flushed 2318 bytes
2022-12-18T05:13:25.3938682Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::io] parsed 18 headers
2022-12-18T05:13:25.3939035Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::conn] incoming body is chunked encoding
2022-12-18T05:13:25.3939330Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0x3 (3 bytes)
2022-12-18T05:13:25.3939686Z [2022-12-18T05:08:46Z WARN reqwest_retry::middleware] Retry attempt #1. Sleeping 53.097849ms before the next attempt
2022-12-18T05:13:25.3940103Z [2022-12-18T05:08:46Z DEBUG hyper::client::pool] reuse idle connection for ("https", artifactcache.actions.githubusercontent.com)
2022-12-18T05:13:25.3940353Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::io] flushed 2318 bytes
2022-12-18T05:13:25.3940598Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::io] parsed 18 headers
2022-12-18T05:13:25.3940878Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::conn] incoming body is chunked encoding
2022-12-18T05:13:25.3941175Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0x3 (3 bytes)
2022-12-18T05:13:25.3941508Z [2022-12-18T05:08:46Z INFO reqwest_tracing::reqwest_otel_span_builder] HTTP request; otel.status_code="ERROR"
2022-12-18T05:13:25.3941845Z [2022-12-18T05:08:46Z INFO reqwest_tracing::reqwest_otel_span_builder] HTTP request; http.status_code=429
2022-12-18T05:13:25.3942174Z [2022-12-18T05:08:46Z INFO reqwest_tracing::reqwest_otel_span_builder] HTTP request; http.user_agent=""
2022-12-18T05:13:25.3942478Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0x138 (312 bytes)
2022-12-18T05:13:25.3942793Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::conn] incoming body completed
2022-12-18T05:13:25.3943218Z [2022-12-18T05:08:46Z DEBUG hyper::client::pool] pooling idle connection for ("https", artifactcache.actions.githubusercontent.com)
2022-12-18T05:13:25.3944403Z [2022-12-18T05:08:46Z DEBUG sccache::compiler::compiler] [aws_sdk_s3]: Cache write error: Cache service responded with 429 Too Many Requests: {"$id":"1","innerException":null,"message":"Request was blocked due to exceeding usage of resource 'Count' in namespace ''.","typeName":"Microsoft.TeamFoundation.Framework.Server.RequestBlockedException, Microsoft.TeamFoundation.Framework.Server","typeKey":"RequestBlockedException","errorCode":0,"eventId":3000}
2022-12-18T05:13:25.3944708Z [2022-12-18T05:08:46Z DEBUG sccache::server] [aws_sdk_s3]: Cache write finished in 21.388 s
2022-12-18T05:13:25.3945012Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0x138 (312 bytes)
2022-12-18T05:13:25.3945338Z [2022-12-18T05:08:46Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0x138 (312 bytes)
And
2022-12-18T05:13:25.3908847Z [2022-12-18T05:08:38Z DEBUG hyper::client::pool] pooling idle connection for ("https", artifactcache.actions.githubusercontent.com)
2022-12-18T05:13:25.3909181Z [2022-12-18T05:08:38Z INFO reqwest_tracing::reqwest_otel_span_builder] HTTP request; http.status_code=204
2022-12-18T05:13:25.3909491Z [2022-12-18T05:08:38Z INFO reqwest_tracing::reqwest_otel_span_builder] HTTP request; http.user_agent=""
2022-12-18T05:13:25.3909910Z [2022-12-18T05:08:38Z DEBUG sccache::compiler::compiler] [aws_config]: Stored in cache successfully!
2022-12-18T05:13:25.3910204Z [2022-12-18T05:08:38Z DEBUG sccache::server] [aws_config]: Cache write finished in 13.018 s
2022-12-18T05:13:25.3910535Z [2022-12-18T05:08:45Z DEBUG sccache::compiler::compiler] [aws_sdk_s3]: Compiled in 21.388 s, storing in cache
2022-12-18T05:13:25.3910931Z [2022-12-18T05:08:46Z INFO gha_toolkit::cache] put; version="5c35548282b30b97a2314eadc6fc42b9b7585a12c4f9efa527ac2e48739babe0"
2022-12-18T05:13:25.3911660Z [2022-12-18T05:08:46Z INFO gha_toolkit::cache] reserve; key="sccache-ubuntu-22.04-x86_64-unknown-linux-gnu-1.62.1-0300a5abb7bd36eb7e4aa0b971d5e94866c6e140" version="7c4ed3dc414c5c2d030ec00d8394e733c5dfeff8cdbae21becc2144de0bb9570" cache_size=21844888
2022-12-18T05:13:25.3912483Z [2022-12-18T05:08:46Z INFO reqwest_tracing::reqwest_otel_span_builder] HTTP request; http.method=POST http.scheme=https http.host=artifactcache.actions.githubusercontent.com net.host.port=0 otel.kind="client" otel.name=POST /LNWNXsg1z0Qt0Mn41fuRdbA1YzcaL5Z2Wf2AYmEQiLGLuQCklb/_apis/artifactcache/caches
I’m not sure how to interpret these logs. But multi-second values for Cache write finished in
seem wrong: I would expect a cache write to effectively be an HTTP request and for inserts to take dozens to hundreds of milliseconds. But maybe the GitHub Actions cache latency is just really high?
I’m unsure what actions are needed for this issue. I’m unsure if there’s any code-level bugs. Maybe GitHub Actions cache is just prohibitively slow? Maybe this slowness/behavior should be documented?
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 26 (4 by maintainers)
@rajivshah3 in the future, please strip the color string, it makes the log harder to read 😉
For cache write, we don’t need to be super fast. Maybe a queue would do. @Xuanwo do you think you could do that once you added the opendal support for GHA? (or as part of the change?)
I thought so too, but
sccace --show-stats
does not show any read or write errors. Maybe they are not reported for GHAC?Hi, @indygreg. Here are our updates on addressing this issue.
I have a test with my two patches: https://github.com/mozilla/sccache/pull/1557 & https://github.com/mozilla/sccache/pull/1550 on apple-platform-rs.
The biggest lesson I have learned from this is that sccache is a cache service, and we don’t need to ensure every write/read succeeds. So instead of waiting to retry limited requests, we will skip this step and keep going.
Here are the differences:
Before
After
No failed build anymore
The action is here https://github.com/Xuanwo/apple-platform-rs/pull/1
Hi, @indygreg, I re-produced you problem with the v0.4.0-pre.5 release at https://github.com/Xuanwo/apple-platform-rs/actions/runs/3871176193/jobs/6598713731
The real problem here is ghac will return a very long
Retry-After
which could up to 60s. So if we follow this limit strictly, we will observe the surprisingly slow.The full error message here:
Returns a cache miss or give up this cache write maybe a better choice.
yes, please do, this should not be done on important project
ok 😦 this is a bit surprising …
@Xuanwo you are auto approving your patches in opendal ?