backstage: ๐ Bug Report: `publish:github:pull-request` fails with `HttpError: You have exceeded a secondary rate limit.`
๐ Description
The scaffolder publish:github:pull-request action fails to make a PR with the error Pull request creation failed; caused by HttpError: You have exceeded a secondary rate limit. Please wait a few minutes before you try again.
We think this is because the underlying plugin used by backstage to create pull requests is inefficient in its queries in some way that scales with the number of files in the sourcePath.
Here is what I think is happening:
backstage is sending type: Files to createPullRequest, which calls createTree, which runs a Promise.all on Object.keys(changes.files) (it makes a new promise for each file passed to it, and fires them all at the same time, if I understand correctly). Each Promise calls valueToTreeObject. Annotated and condensed:
export async function valueToTreeObject(
...
value: string | File // It's a File!
) {
...
if (typeof value === "string") {
// Nope!
}
// This is what gets called!
const { data } = await octokit.request(
"POST /repos/{owner}/{repo}/git/blobs",
{
owner,
repo,
...value,
}
);
...
}
So this is POSTing to GitHub for every file (critically, every file in the sourcePath, NOT just every changed file). Ordinarily, this would just take a long time to complete, but backstage disables octokit throttling, so instead of retrying, it fails immediately (firing ~500 POST requests at the same time means some fail with the rate limit error, meaning Promise.all fails).
๐ Expected behavior
- PRs should take longer to complete instead of failing
- PRs should not take long to complete, where possible
For 1), I think throttling should be re-enabled (this PR reverted) โ it is important to respect GitHubโs rate limits. If they are ignored, then templates fail halfway through their run, potentially after some resources have already been created. It would be more robust if they took longer instead of failing. Still, if there are issues like a PR taking ten minutes, that is a sign that something is wrong, and the underlying usage of the API should be fixed.
For 2), because 10-minute PRs are not viable, one of the following should also be done:
- The
publish:github:pull-requestshould very clearly state in the docs that it will be slower with many files. They should explain the best practices for usage (making a PR with a sparse file tree with only changes). The step should log a warning if called with >100 (number picked randomly) files. octokit-plugin-create-pull-requestshould be updated to use a more efficient method of querying GitHub โ maybe the GraphQL API? Or better yet, it should not query an API at all to make a blob.fetch:plainand then a consecutivepublish:github:pull-requestshould be updated to only send the diff tooctokit-plugin-create-pull-requestโ this could maybe be done by pulling the.gitdirectory as well and doing an actualdiffto filter to only changed files.publish:github:pull-requestshould sendstrings instead ofFiles tooctokit-plugin-create-pull-request, working around its issue. I donโt know if this will work, because you do some custom things to themode.
I can make a PR for re-enabling throttling, if that is desired. I could make a PR for 2.1 if you let me know where it should go (the JSON Schema description? Here somewhere?) I donโt think I can get the time to make a PR for 2.2 or 2.3. I can possibly make a PR for 2.4.
๐ Actual Behavior with Screenshots
In development and production, PRs fail with:
12023-03-30T22:34:15.277Z Beginning step Onboarding PR: <MYREPO>
22023-03-30T22:34:17.739Z GithubResponseError: Pull request creation failed; caused by HttpError: You have exceeded a secondary rate limit. Please wait a few minutes before you try again.
3 at Object.handler (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js:3673:15)
4 at runMicrotasks (<anonymous>)
5 at runNextTicks (node:internal/process/task_queues:61:5)
6 at listOnTimeout (node:internal/timers:528:9)
7 at processTimers (node:internal/timers:502:7)
8 at async NunjucksWorkflowRunner.execute (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js:4831:11)
9 at async TaskWorker.runOneTask (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js:5042:26)
10 at async run (/app/node_modules/p-queue/dist/index.js:163:29)
๐ Reproduction steps
Set up GitHub integration with a personal access token (org application works too).
Create a scaffolder template that fetches a repo, manipulates a file, and PRs back to the repo:
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: test-template
title: Test template
spec:
owner: myTeam
type: service
parameters:
- title: Test
properties:
test:
title: Test
type: string
steps:
# Fetch a remote repo with a good number (>500) files
- id: fetch-remote
name: Fetch Remote Repo
action: fetch:plain
input:
url: https://github.com/myorg/myrepo.git
targetPath: myrepo
# Manipulate a file inside
- id: do-something
name: Update file
action: some:action
input:
path: myrepo/folder/somefile.yml
# PR back the change
- id: pr-repo
name: PR to repo
action: publish:github:pull-request
input:
repoUrl: github.com?repo=myrepo&owner=myorg
branchName: some-branch-name
gitCommitMessage: Message
title: PR Title
sourcePath: myrepo
targetPath: '.'
Execute the template. If youโre not getting rate limit errors, duplicate the PR step with a second branch. Sometimes we were only getting it after the second PR, with a total of ~1000 files between both repos.
To show that this is indeed caused by ignoring the rate limit headers, edit node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js and remove the line ...{ throttle: { enabled: false } }. Re-run the template. For me, this took ~3m to run the first PR, and ~11 to run the second, but they both succeeded.
To show that the issue is resolved by not running a PR from the full repo directory, amend the template to something more like:
Mostly duplicated template
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: test-template
title: Test template
spec:
owner: myTeam
type: service
parameters:
- title: Test
properties:
test:
title: Test
type: string
steps:
# Fetch a remote repo with a good number (>500) files
- id: fetch-remote
name: Fetch Remote Repo
action: fetch:plain
input:
url: https://github.com/myorg/myrepo.git
targetPath: myrepo
# Manipulate a file inside
- id: do-something
name: Update file
action: some:action
input:
path: myrepo/folder/somefile.yml
outputPath: myrepo-bare/folder/somefile.yml # amend syntax to action in use
# PR back the change
- id: pr-repo
name: PR to repo
action: publish:github:pull-request
input:
repoUrl: github.com?repo=myrepo&owner=myorg
branchName: some-branch-name
gitCommitMessage: Message
title: PR Title
sourcePath: myrepo-bare
targetPath: '.'
Refresh the template to make sure youโre using the updated one, and run it again. It should succeed quickly for each PR (<10s).
๐ Provide the context for the Bug.
The template we are making needs to make pull requests into two other repositories โ one to onboard it to our CICD platform, and one to onboard it to our Flux admin repository. What weโre actually doing is using roadiehq:utils:jsonata:yaml:transform to manipulate some yaml files to add the repo name, and some metadata about it.
Weโve worked around this for now by cloning to one directory, reading and manipulating the files, writing the changed files to a second, bare directory, and then running the pull request from the bare directory. This works, but is unintuitive because it differs greatly from a CLI git workflow. (Clone, edit, commit, push, PR โ not Clone, edit, copy edited file to a blank directory, commit, push, PR).
Until we figured out the cause, this was slowing down development for ~2 weeks.
๐ฅ๏ธ Your Environment
yarn backstage-cli info:
yarn run v1.22.19
$ /Users/rmartine/dev/ias-backstage/node_modules/.bin/backstage-cli info
OS: Darwin 22.4.0 - darwin/arm64
node: v16.18.1
yarn: 1.22.19
cli: 0.22.3 (installed)
backstage: 1.8.0
Dependencies:
@backstage/app-defaults 1.1.0
@backstage/backend-app-api 0.4.1
@backstage/backend-common 0.18.3
@backstage/backend-dev-utils 0.1.1
@backstage/backend-plugin-api 0.4.0, 0.5.0
@backstage/backend-tasks 0.4.3, 0.5.0
@backstage/catalog-client 1.4.0
@backstage/catalog-model 1.2.1
@backstage/cli-common 0.1.12
@backstage/cli 0.22.3
@backstage/config-loader 1.1.9
@backstage/config 1.0.7
@backstage/core-app-api 1.4.0
@backstage/core-components 0.12.4
@backstage/core-plugin-api 1.4.0
@backstage/errors 1.1.5
@backstage/eslint-plugin 0.1.1
@backstage/integration-aws-node 0.1.2
@backstage/integration-react 1.1.10
@backstage/integration 1.4.3
@backstage/plugin-api-docs 0.8.14
@backstage/plugin-app-backend 0.3.42
@backstage/plugin-auth-backend 0.18.0
@backstage/plugin-auth-node 0.2.12
@backstage/plugin-catalog-backend-module-github 0.2.5
@backstage/plugin-catalog-backend 1.8.0
@backstage/plugin-catalog-common 1.0.12
@backstage/plugin-catalog-graph 0.2.26
@backstage/plugin-catalog-import 0.9.4
@backstage/plugin-catalog-node 1.3.4
@backstage/plugin-catalog-react 1.3.0
@backstage/plugin-catalog 1.7.2
@backstage/plugin-events-node 0.2.3
@backstage/plugin-kubernetes-backend 0.9.3
@backstage/plugin-kubernetes-common 0.6.0
@backstage/plugin-kubernetes 0.7.8
@backstage/plugin-org 0.6.4
@backstage/plugin-permission-backend 0.5.17
@backstage/plugin-permission-common 0.7.4
@backstage/plugin-permission-node 0.7.6
@backstage/plugin-permission-react 0.4.10
@backstage/plugin-proxy-backend 0.2.35
@backstage/plugin-scaffolder-backend 1.12.0
@backstage/plugin-scaffolder-common 1.2.6
@backstage/plugin-scaffolder-node 0.1.1
@backstage/plugin-scaffolder-react 1.1.0
@backstage/plugin-scaffolder 1.11.0
@backstage/plugin-search-backend-module-pg 0.5.3
@backstage/plugin-search-backend-node 1.1.3
@backstage/plugin-search-backend 1.2.2
@backstage/plugin-search-common 1.2.2
@backstage/plugin-search-react 1.4.0
@backstage/plugin-search 1.0.7
@backstage/plugin-tech-radar 0.5.20
@backstage/plugin-techdocs-backend 1.5.2
@backstage/plugin-techdocs-module-addons-contrib 1.0.9
@backstage/plugin-techdocs-node 1.4.5
@backstage/plugin-techdocs-react 1.1.2
@backstage/plugin-techdocs 1.4.3
@backstage/plugin-user-settings 0.5.1
@backstage/release-manifests 0.0.8
@backstage/test-utils 1.2.4
@backstage/theme 0.2.17
@backstage/types 1.0.2
@backstage/version-bridge 1.0.3
Done in 0.77s.
๐ Have you spent some time to check if this bug has been raised before?
- I checked and didnโt find similar issue
๐ข Have you read the Code of Conduct?
- I have read the Code of Conduct
Are you willing to submit PR?
Yes I am willing to submit a PR!
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 33
- Comments: 29 (24 by maintainers)
@rmartine-ias Thanks for the reply, it worked for my problem! Good job compiling all this informations, helped me a lot
Still an issue, still working on solution
@rmartine-ias sorry I seemed to have missed your last ping on this message.
Great work digging into this and coming up with some suggestions as to what we want to do here, really awesome job.
Weโve actually had a discussion a little bit about these pull requests actions, and are kind of leaning towards in the future just providing some more generic git actions to use instead of these specific ones for each provider. The thinking is that we should be able to re-use more of the generic actions to be able to create branches / forks and push to these sources, and then we have an action to create a PR or Merge Requests using the providers APIs with the base and target branches.
With that said however, thatโs not coming any time soon, and I think we want to explore that a little more before committing to it being something we want, so Iโm happy to proceed with some of your suggestions in the short term to get this working!
Letโs do this, seems sane!
I wonder if we can mask up this behaviour into opt in or something for now as you suggested, so that you can test it when it getโs released in anger without breaking anyones workflow and then we can look at options if we want to mimic
.gitattributes.Hope this is OK? ๐