cache: Cache creation failed
I used the following code in the actions of my project:
# Cache node_modules
- name: Cache dependencies
uses: actions/cache@v2
id: yarn-cache
with:
path: |
**/node_modules
key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.os }}-yarn-
However, the following error occurred during execution, This makes me very confused:
Warning: getCacheEntry failed: Cache service responded with 500
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 84
- Comments: 69 (5 by maintainers)
Commits related to this issue
- reverting npm caching until actions/cache#698 is resolved — committed to raelcun/plan-my-trip-frontend by raelcun 3 years ago
- Commented node modules caching bacause it is not working for now by issue https://github.com/actions/cache/issues/698 — committed to ownik/ci-traffic-light by ownik 3 years ago
- Temporarily disable cache on CI (actions/cache#698) — committed to ciffelia/og-image by ciffelia 3 years ago
- Temporarily disable cache on CI (actions/cache#698) — committed to ciffelia/og-image by ciffelia 3 years ago
- Temporarily disable cache on CI (actions/cache#698) — committed to felipecsl/obgen by felipecsl 3 years ago
- Restore cache https://github.com/actions/cache/issues/698 — committed to apache/maven-gh-actions-shared by slachiewicz 3 years ago
- revert: reenable cache - https://github.com/actions/cache/issues/698 — committed to FranciscoKloganB/vitesse-enterprise by FranciscoKloganB 2 years ago
@dhadka This happens again right now. can you look at this?
Also, just so we get visibility, please react with a 👍 if it’s resolved for you or 👎 if not.
Thanks for letting us know it’s fixed and for your patience!
Root Cause
We traced this outage back to a bug that was introduced last week to the framework code that our various microservices, of which caching is one, are built on.
When a new repo is created that uses the cache, it gets assigned to one of our databases. Eventually, these databases fill up and a background job automatically seals them, preventing new accounts from being assigned to that database. This normally isn’t a problem as we will provision a new database before sealing the existing one, but due to the bug above database creation was failing.
As a mitigation last night, we unsealed one of the existing databases. But that background job, which runs once an hour, re-sealed the database. Once we realized that job was undoing our mitigation, we took steps to disable the job. We will be deploying a fix for the original bug today.
Repair items
There will likely be other repair items as we look more today, but some initial repair items I have in mind are:
Add alerts for failed database creation. This would ping our on-call engineers and let them respond quicker.
Make the various
setup-*
actions fault-tolerant when using caching. Caching should be best effort and not fail workflows, but in this case exceptions thrown from the cache module weren’t being handled.As a safety measure, add check to avoid sealing off all databases if there aren’t any others available.
One more here! The bug has return!
Fellows are gathering now 😃 This bug returned and seems to be kind of unstable
I have the same problem. I was getting crazy because google gave me nothing and the github status page says actions is ok
@dhadka This happens again right now. can you look at this?
Yeah, unfortunately our original mitigation was undone by an automated job. We’ve reapplied the fix plus another change to disable that job, and are now trying to determine if that’s sufficient.
jobs still failing 😦
Works for me too, after several attemps, thx
that’s a lot of work for an org with 4k repos 😦
this is not ideal as we usually expect the cache to continue forward. I think the warning should be actually an error.
Workflow failed twice, but now it’s working for me 😃
Same here with setup-python…
I’ve created a support ticket with Github support, maybe that helps?
I agree, but I don’t know how to contact them
See https://github.com/actions/cache/issues/820
For now, the workaround is to disable caching in your workflows - if possible
The problem continues:
https://github.com/arthurfiorette/cache-parser/runs/4499172672?check_suite_focus=true
https://github.com/arthurfiorette/cache-parser/blob/727a6cb819869dc89756034ca3749bb2a0f4baa9/.github/workflows/codeql.yml#L33-L37
Ran into this issue as well. At least now I know it’s not something to do with the workflow itself. Might be worth to ping some staff? Doesn’t seem to be anything we can resolve right now on our own.
@TheBeachMaster Thanks Bro 😃
https://github.com/actions/cache/issues/820
A possible workaround may be to add
continue-on-error: true
to the step, so if the cache service fails, the action still continues.~They just returned server back alive. It works without any changes in the codebase~
I tried it, you can turn off the npm cache first, it seems to work 😢
This problem has been solved. Thank you very much!❤️
I have the same problem in a new repository too