cache: Can't store using hashFiles if files don't exist yet

For example, in a Rust project, the target directory is almost never committed as it includes build artifacts. I’d like to cache it between builds, but using

name: Cache target dir
uses: actions/cache@v1
with:
  path: target
  key: ${{ runner.os }}-cargo-target-${{ hashFiles('target') }} # also tried variants of `target/**`...

triggers a failure that prevents the build from continuing:

##[error]The template is not valid. hashFiles('/home/runner/work/.../.../target') failed. Search pattern '/home/runner/work/.../.../target' doesn't match any file under '/home/runner/work/.../...'

which does make sense! After checking out, there is not yet a target directory. In fact, that’s what I aim to create (maybe) when I restore from the cache. So storing would work with this template, but restoring would not, since the key can’t be computed.

If I supply restore-keys, though, I still get the same error. It seems to me like something (maybe not in this codebase) is just dying early because the template isn’t fully computable, instead of trying to fall back on the restore-keys first.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 7
  • Comments: 20 (4 by maintainers)

Commits related to this issue

Most upvoted comments

@rye perhaps we could update hashFiles with an additional parameter that you could set that would have it not fail if it didn’t find any files. I suppose the other option is to simply warn rather than fail, however, that would mean in your case that you would perpetually have a warning on your jobs that you could likely never get rid of.

I do agree that the current error form makes things stricter, but I don’t think the strictness is helpful—a hashFiles error (due to files not existing) seems to be causing a parsing error, i.e. restore_keys is not even used at all; the cache restore just completely fails. My use case would be satisfied if restore_keys was used in such cases.

Of course, I could leave off the hashFiles in my key, but then I would basically have a persistent cache that never got updated if things changed.

The way this action currently works seems really suboptimal. It only works if your Cargo.lock basically never changes. But having a Cargo.lock is advised against for libraries as you want to test your library against a wide range of semver compatible versions as this is how your library is going to be used in the end. So most Rust projects don’t have a Cargo.lock for that reason.

You could generate one on the fly and use that as the hash, but the chances that nothing changed are very slim. So in the end you likely don’t ever get a cache hit and end up recompiling everything from scratch.

So what we ended up doing is scrapping the whole hashing mechanism and just using the number of the week in the year as the cache key, as that keeps a reasonably recent cache around and overwrites it with a new one every week (actually every second week). This is so far the best workaround I’ve seen for Rust projects (or similar) where you don’t really have a lockfile.

+1 for @rye’s idea.

Regarding @chrispat comment,

I think the core of this discussion really boils down to helping the user understand they have specified a pattern in hashFiles that does not match anything or not.

I vote for giving the developer the benefit of the doubt that they are intelligent and chose the correct file path, even if that file path might not yet exist in their repo but would eventually show up in their build process.

Regarding the argument that hashFiles should fail if it doesn’t find at least one file in the project I think could be argued against by saying it doesn’t guard against finding more files than the developer anticipated. If the hashFiles function allows for unbounded number of files to be found, zero matches seems reasonable to me too.

If the overall build workflow isn’t working as the user anticipated then the developer would go debug it anyways. A message in the log at runtime by the cache action to say it found a hit or not should suffice for the developer to figure out what’s going on to realize they have a typo or the file(s) they expected to be there never showed up, etc.

Thanks

@chrispat yeah, that first idea sounds like a good solution to me, if that works best.

I am still wondering if it is possible to just handle the case where hashed files don’t exist by falling back on restore-keys (and possibly proceeding to fail if those keys aren’t specified or can’t be restored)? I’d be happy to make those changes, but I don’t know where I could change that behavior.

What this would mean for the user is:

uses: actions/cache@v1
with:
  path: target
  key: cargo-deps-${{ hashFiles('target') }}
  restore-keys:
    - cargo-deps-

In the event that hashFiles cannot find its parameter, it tries to find something in restore-keys that does exist. Here it finds the most recent cache, cargo-deps-abcdef, and restores it, proceeding to continue through the build. If the path does not change from its restored value, cargo-deps-abcdef, (i.e. at the end of the build, if the key is recomputed as cargo-deps- abcdef) then nothing has changed and we do not need to store again. However, if something in the specified path changed, then a new cache is produced, cargo-deps-012345, which would then be restored later.

In short, I would still expect a failure of hashFiles to cause the cache restoration to try to fall back on whatever is specified in restore_keys—if that change is made, no additional parameters would be needed, and the existing API would remain exactly the same. Indeed, I don’t think anything relating to caching should stop a build from continuing by default, but that is configurable through a parameter.