cache: Cache creation failed

I used the following code in the actions of my project:

      # Cache node_modules
      - name: Cache dependencies
        uses: actions/cache@v2
        id: yarn-cache
        with:
          path: |
            **/node_modules
          key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
          restore-keys: |
            ${{ runner.os }}-yarn-

However, the following error occurred during execution, This makes me very confused:

    Warning: getCacheEntry failed: Cache service responded with 500

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 84
  • Comments: 69 (5 by maintainers)

Commits related to this issue

Most upvoted comments

@dhadka This happens again right now. can you look at this?

Also, just so we get visibility, please react with a 👍 if it’s resolved for you or 👎 if not.

Thanks for letting us know it’s fixed and for your patience!

Root Cause

We traced this outage back to a bug that was introduced last week to the framework code that our various microservices, of which caching is one, are built on.

When a new repo is created that uses the cache, it gets assigned to one of our databases. Eventually, these databases fill up and a background job automatically seals them, preventing new accounts from being assigned to that database. This normally isn’t a problem as we will provision a new database before sealing the existing one, but due to the bug above database creation was failing.

As a mitigation last night, we unsealed one of the existing databases. But that background job, which runs once an hour, re-sealed the database. Once we realized that job was undoing our mitigation, we took steps to disable the job. We will be deploying a fix for the original bug today.

Repair items

There will likely be other repair items as we look more today, but some initial repair items I have in mind are:

  1. Add alerts for failed database creation. This would ping our on-call engineers and let them respond quicker.

  2. Make the various setup-* actions fault-tolerant when using caching. Caching should be best effort and not fail workflows, but in this case exceptions thrown from the cache module weren’t being handled.

  3. As a safety measure, add check to avoid sealing off all databases if there aren’t any others available.

One more here! The bug has return!

Fellows are gathering now 😃 This bug returned and seems to be kind of unstable

I have the same problem. I was getting crazy because google gave me nothing and the github status page says actions is ok

@dhadka This happens again right now. can you look at this?

Yeah, unfortunately our original mitigation was undone by an automated job. We’ve reapplied the fix plus another change to disable that job, and are now trying to determine if that’s sufficient.

They just returned server back alive. It works without any changes in the codebase

jobs still failing 😦

Works for me too, after several attemps, thx

Hola, why was this closed? It’s breaking all my pipelines 😦

See #820

For now, the workaround is to disable caching in your workflows - if possible

that’s a lot of work for an org with 4k repos 😦

A possible workaround may be to add continue-on-error: true to the step, so if the cache service fails, the action still continues.

this is not ideal as we usually expect the cache to continue forward. I think the warning should be actually an error.

Workflow failed twice, but now it’s working for me 😃

Same here with setup-python…

I’ve created a support ticket with Github support, maybe that helps?

Ran into this issue as well. At least now I know it’s not something to do with the workflow itself. Might be worth to ping some staff? Doesn’t seem to be anything we can resolve right now on our own.

I agree, but I don’t know how to contact them

Hola, why was this closed? It’s breaking all my pipelines 😦

See https://github.com/actions/cache/issues/820

For now, the workaround is to disable caching in your workflows - if possible

Ran into this issue as well. At least now I know it’s not something to do with the workflow itself. Might be worth to ping some staff? Doesn’t seem to be anything we can resolve right now on our own.

@TheBeachMaster Thanks Bro 😃

A possible workaround may be to add continue-on-error: true to the step, so if the cache service fails, the action still continues.

~They just returned server back alive. It works without any changes in the codebase~

I tried it, you can turn off the npm cache first, it seems to work 😢

This problem has been solved. Thank you very much!❤️

I have the same problem in a new repository created via template

I have the same problem in a new repository too