buildx: Using gha cache with `mode=max` fails with 400 error

Hello, I have recently started using cahe-to "type=gha,mode=max,scope=..." to cache all the layers and the following error seemed to persist only for specific builds (consistently failing on the same). After removeing the mode=max the issues went away but obviousely not everything is cached.

Failing builds with mode=max:

Passing build after removing mode=max

#130 exporting cache
#130 preparing build cache for export
#130 preparing build cache for export 121.0s done
#130 writing layer sha256:06be072867b08bb9aef2e469533d0d0d6a85f97c2aabcaf5103d78c217977918
#130 writing layer sha256:06be072867b08bb9aef2e469533d0d0d6a85f97c2aabcaf5103d78c217977918 0.1s done
#130 writing layer sha256:08980434c63c0751ea69086b9c8282a5b2784f78643934cf62036ae07e94b943
#130 writing layer sha256:08980434c63c0751ea69086b9c8282a5b2784f78643934cf62036ae07e94b943 0.1s done
...
redacted to keep it short
...
#130 writing layer sha256:d91065fb02477295b9823f300711ac850313d3b6a81a6ca1f8d214f8700b8b2e
#130 writing layer sha256:d91065fb02477295b9823f300711ac850313d3b6a81a6ca1f8d214f8700b8b2e 4.0s done
#130 ERROR: error writing layer blob: error committing cache 37887: failed to parse error response 400: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request</h2>
<hr><p>HTTP Error 400. The request is badly formed.</p>
</BODY></HTML>
: invalid character '<' looking for beginning of value
------
 > exporting cache:
------
error: failed to solve: error writing layer blob: error committing cache 37887: failed to parse error response 400: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request</h2>
<hr><p>HTTP Error 400. The request is badly formed.</p>
</BODY></HTML>
: invalid character '<' looking for beginning of value
Error: buildx failed with: : invalid character '<' looking for beginning of value

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 39 (12 by maintainers)

Commits related to this issue

Most upvoted comments

If there would be an endpoint we could use just to check that a cache key exists that would not be rate-limited(or with a much higher limit) it would be much less likely to hit these limits.

If you’re using the cache toolkit module, then read and writes are handled in separate calls (restoreCache vs saveCache). But restoreCache checks if the cache exists + downloads the content.

Inside restoreCache, it makes a call to cacheHttpClient.getCacheEntry that can be used to just check if the record exists:

  const cacheEntry = await cacheHttpClient.getCacheEntry(keys, paths, {
    compressionMethod
  })
  if (!cacheEntry?.archiveLocation) {
    // Cache not found
    return undefined
  }

Although this is getting into some of the internal implementation. Perhaps we need to add another top-level function alongside restoreCache and saveCache that just checks if the cache exists.

This GET operation also has a higher rate limit (900). Although now that the seal/reserve limit was increased (300 -> 700), it’s not much higher 😄 @t-dedah can we please look into increasing this limit?

@t-dedah @dhadka I’ll try to give some more background on where I think our requests come from.

Let’s say the user is doing a build that touches, for example, 50 blobs/layers in total(with Github cache users usually would export all intermediate layers as well, not only the final result layers so this can grow fast). If that build is mostly cached, for example updated only 1 layer we will create a new “manifest” blob with links to these 50 layers. We will push the manifest and new layer, but most importantly we still need to make a request for all the old 49 blobs just to check if they still exist. We will make a request for all of them, Github will answer that record exists and we can continue. This needs to happen for all builds and some repositories make a lot of builds in a single workflow or very complex(multi-platform) builds with lots of layers. Even if lots of builds mostly share same layers we need to check them all on each build.

If there would be an endpoint we could use just to check that a cache key exists that would not be rate-limited(or with a much higher limit) it would be much less likely to hit these limits. That endpoint does not need to provide a download link or reserve a key like the current requests do. It could also be a batch endpoint to check multiple keys together. Maybe even just endpoint to list all current keys would be manageable as keys should be small and not take much room even if there are a lot of them.

Another way would be for us to somehow remember that the cache key existed and not check it more often than some timeframe. But for that, we would need some kind of external storage/database where we could keep these records.

The reasoning behind the new 700 limit

Looks like this is a private repository.

@hertzg apologize for the delay, but the good new is that we are rolling out a change to increase the rate limit threshold. This should allow buildx action to cache layers with higher level of parallelization. cc @t-dedah to keep you in loop when rollout is complete.

Would love to hear back on whether this helps in reducing the 429s.

@hertzg we will be looking into relaxing the rate limit. But we need to carefully evaluate the extra load it will bring on the system. I will update you with an ETA.

@hertzg We will probably not do another patch release just for this unless something else comes up as well. You can use the master branch image(or pin it to digest for safety). cc @crazy-max

@dhadka That could be the issue indeed. Thanks for the pointer. I’ll update that logic.