backstage: ๐Ÿ› Bug Report: `publish:github:pull-request` fails with `HttpError: You have exceeded a secondary rate limit.`

๐Ÿ“œ Description

The scaffolder publish:github:pull-request action fails to make a PR with the error Pull request creation failed; caused by HttpError: You have exceeded a secondary rate limit. Please wait a few minutes before you try again.

We think this is because the underlying plugin used by backstage to create pull requests is inefficient in its queries in some way that scales with the number of files in the sourcePath.

Here is what I think is happening:

backstage is sending type: Files to createPullRequest, which calls createTree, which runs a Promise.all on Object.keys(changes.files) (it makes a new promise for each file passed to it, and fires them all at the same time, if I understand correctly). Each Promise calls valueToTreeObject. Annotated and condensed:

export async function valueToTreeObject(
  ...
  value: string | File // It's a File!
) {
  ...
  if (typeof value === "string") {
    // Nope!
  }
  
  // This is what gets called!
  const { data } = await octokit.request(
    "POST /repos/{owner}/{repo}/git/blobs",
    {
      owner,
      repo,
      ...value,
    }
  );
  ...
}

So this is POSTing to GitHub for every file (critically, every file in the sourcePath, NOT just every changed file). Ordinarily, this would just take a long time to complete, but backstage disables octokit throttling, so instead of retrying, it fails immediately (firing ~500 POST requests at the same time means some fail with the rate limit error, meaning Promise.all fails).

๐Ÿ‘ Expected behavior

  1. PRs should take longer to complete instead of failing
  2. PRs should not take long to complete, where possible

For 1), I think throttling should be re-enabled (this PR reverted) โ€“ it is important to respect GitHubโ€™s rate limits. If they are ignored, then templates fail halfway through their run, potentially after some resources have already been created. It would be more robust if they took longer instead of failing. Still, if there are issues like a PR taking ten minutes, that is a sign that something is wrong, and the underlying usage of the API should be fixed.

For 2), because 10-minute PRs are not viable, one of the following should also be done:

  1. The publish:github:pull-request should very clearly state in the docs that it will be slower with many files. They should explain the best practices for usage (making a PR with a sparse file tree with only changes). The step should log a warning if called with >100 (number picked randomly) files.
  2. octokit-plugin-create-pull-request should be updated to use a more efficient method of querying GitHub โ€“ maybe the GraphQL API? Or better yet, it should not query an API at all to make a blob.
  3. fetch:plain and then a consecutive publish:github:pull-request should be updated to only send the diff to octokit-plugin-create-pull-request โ€“ this could maybe be done by pulling the .git directory as well and doing an actual diff to filter to only changed files.
  4. publish:github:pull-request should send strings instead of Files to octokit-plugin-create-pull-request, working around its issue. I donโ€™t know if this will work, because you do some custom things to the mode.

I can make a PR for re-enabling throttling, if that is desired. I could make a PR for 2.1 if you let me know where it should go (the JSON Schema description? Here somewhere?) I donโ€™t think I can get the time to make a PR for 2.2 or 2.3. I can possibly make a PR for 2.4.

๐Ÿ‘Ž Actual Behavior with Screenshots

In development and production, PRs fail with:

12023-03-30T22:34:15.277Z Beginning step Onboarding PR: <MYREPO>
22023-03-30T22:34:17.739Z GithubResponseError: Pull request creation failed; caused by HttpError: You have exceeded a secondary rate limit. Please wait a few minutes before you try again.
3    at Object.handler (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js:3673:15)
4    at runMicrotasks (<anonymous>)
5    at runNextTicks (node:internal/process/task_queues:61:5)
6    at listOnTimeout (node:internal/timers:528:9)
7    at processTimers (node:internal/timers:502:7)
8    at async NunjucksWorkflowRunner.execute (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js:4831:11)
9    at async TaskWorker.runOneTask (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js:5042:26)
10    at async run (/app/node_modules/p-queue/dist/index.js:163:29)

๐Ÿ‘Ÿ Reproduction steps

Set up GitHub integration with a personal access token (org application works too).

Create a scaffolder template that fetches a repo, manipulates a file, and PRs back to the repo:

apiVersion: scaffolder.backstage.io/v1beta3
kind: Template

metadata:
  name: test-template
  title: Test template

spec:
  owner: myTeam
  type: service

  parameters:
    - title: Test
      properties:
        test:
          title: Test
          type: string

  steps:
    # Fetch a remote repo with a good number (>500) files
    - id: fetch-remote
      name: Fetch Remote Repo
      action: fetch:plain
      input:
        url: https://github.com/myorg/myrepo.git
        targetPath: myrepo

    # Manipulate a file inside
    - id: do-something
      name: Update file
      action: some:action
      input:
        path: myrepo/folder/somefile.yml

    # PR back the change
    - id: pr-repo
      name: PR to repo
      action: publish:github:pull-request
      input:
        repoUrl: github.com?repo=myrepo&owner=myorg
        branchName: some-branch-name
        gitCommitMessage: Message
        title: PR Title
        sourcePath: myrepo
        targetPath: '.'

Execute the template. If youโ€™re not getting rate limit errors, duplicate the PR step with a second branch. Sometimes we were only getting it after the second PR, with a total of ~1000 files between both repos.

To show that this is indeed caused by ignoring the rate limit headers, edit node_modules/@backstage/plugin-scaffolder-backend/dist/index.cjs.js and remove the line ...{ throttle: { enabled: false } }. Re-run the template. For me, this took ~3m to run the first PR, and ~11 to run the second, but they both succeeded.

To show that the issue is resolved by not running a PR from the full repo directory, amend the template to something more like:

Mostly duplicated template
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template

metadata:
  name: test-template
  title: Test template

spec:
  owner: myTeam
  type: service

  parameters:
    - title: Test
      properties:
        test:
          title: Test
          type: string

  steps:
    # Fetch a remote repo with a good number (>500) files
    - id: fetch-remote
      name: Fetch Remote Repo
      action: fetch:plain
      input:
        url: https://github.com/myorg/myrepo.git
        targetPath: myrepo

    # Manipulate a file inside
    - id: do-something
      name: Update file
      action: some:action
      input:
        path: myrepo/folder/somefile.yml
        outputPath: myrepo-bare/folder/somefile.yml # amend syntax to action in use

    # PR back the change
    - id: pr-repo
      name: PR to repo
      action: publish:github:pull-request
      input:
        repoUrl: github.com?repo=myrepo&owner=myorg
        branchName: some-branch-name
        gitCommitMessage: Message
        title: PR Title
        sourcePath: myrepo-bare
        targetPath: '.'

Refresh the template to make sure youโ€™re using the updated one, and run it again. It should succeed quickly for each PR (<10s).

๐Ÿ“ƒ Provide the context for the Bug.

The template we are making needs to make pull requests into two other repositories โ€“ one to onboard it to our CICD platform, and one to onboard it to our Flux admin repository. What weโ€™re actually doing is using roadiehq:utils:jsonata:yaml:transform to manipulate some yaml files to add the repo name, and some metadata about it.

Weโ€™ve worked around this for now by cloning to one directory, reading and manipulating the files, writing the changed files to a second, bare directory, and then running the pull request from the bare directory. This works, but is unintuitive because it differs greatly from a CLI git workflow. (Clone, edit, commit, push, PR โ€“ not Clone, edit, copy edited file to a blank directory, commit, push, PR).

Until we figured out the cause, this was slowing down development for ~2 weeks.

๐Ÿ–ฅ๏ธ Your Environment

yarn backstage-cli info:

yarn run v1.22.19
$ /Users/rmartine/dev/ias-backstage/node_modules/.bin/backstage-cli info
OS:   Darwin 22.4.0 - darwin/arm64
node: v16.18.1
yarn: 1.22.19
cli:  0.22.3 (installed)
backstage:  1.8.0

Dependencies:
  @backstage/app-defaults                          1.1.0
  @backstage/backend-app-api                       0.4.1
  @backstage/backend-common                        0.18.3
  @backstage/backend-dev-utils                     0.1.1
  @backstage/backend-plugin-api                    0.4.0, 0.5.0
  @backstage/backend-tasks                         0.4.3, 0.5.0
  @backstage/catalog-client                        1.4.0
  @backstage/catalog-model                         1.2.1
  @backstage/cli-common                            0.1.12
  @backstage/cli                                   0.22.3
  @backstage/config-loader                         1.1.9
  @backstage/config                                1.0.7
  @backstage/core-app-api                          1.4.0
  @backstage/core-components                       0.12.4
  @backstage/core-plugin-api                       1.4.0
  @backstage/errors                                1.1.5
  @backstage/eslint-plugin                         0.1.1
  @backstage/integration-aws-node                  0.1.2
  @backstage/integration-react                     1.1.10
  @backstage/integration                           1.4.3
  @backstage/plugin-api-docs                       0.8.14
  @backstage/plugin-app-backend                    0.3.42
  @backstage/plugin-auth-backend                   0.18.0
  @backstage/plugin-auth-node                      0.2.12
  @backstage/plugin-catalog-backend-module-github  0.2.5
  @backstage/plugin-catalog-backend                1.8.0
  @backstage/plugin-catalog-common                 1.0.12
  @backstage/plugin-catalog-graph                  0.2.26
  @backstage/plugin-catalog-import                 0.9.4
  @backstage/plugin-catalog-node                   1.3.4
  @backstage/plugin-catalog-react                  1.3.0
  @backstage/plugin-catalog                        1.7.2
  @backstage/plugin-events-node                    0.2.3
  @backstage/plugin-kubernetes-backend             0.9.3
  @backstage/plugin-kubernetes-common              0.6.0
  @backstage/plugin-kubernetes                     0.7.8
  @backstage/plugin-org                            0.6.4
  @backstage/plugin-permission-backend             0.5.17
  @backstage/plugin-permission-common              0.7.4
  @backstage/plugin-permission-node                0.7.6
  @backstage/plugin-permission-react               0.4.10
  @backstage/plugin-proxy-backend                  0.2.35
  @backstage/plugin-scaffolder-backend             1.12.0
  @backstage/plugin-scaffolder-common              1.2.6
  @backstage/plugin-scaffolder-node                0.1.1
  @backstage/plugin-scaffolder-react               1.1.0
  @backstage/plugin-scaffolder                     1.11.0
  @backstage/plugin-search-backend-module-pg       0.5.3
  @backstage/plugin-search-backend-node            1.1.3
  @backstage/plugin-search-backend                 1.2.2
  @backstage/plugin-search-common                  1.2.2
  @backstage/plugin-search-react                   1.4.0
  @backstage/plugin-search                         1.0.7
  @backstage/plugin-tech-radar                     0.5.20
  @backstage/plugin-techdocs-backend               1.5.2
  @backstage/plugin-techdocs-module-addons-contrib 1.0.9
  @backstage/plugin-techdocs-node                  1.4.5
  @backstage/plugin-techdocs-react                 1.1.2
  @backstage/plugin-techdocs                       1.4.3
  @backstage/plugin-user-settings                  0.5.1
  @backstage/release-manifests                     0.0.8
  @backstage/test-utils                            1.2.4
  @backstage/theme                                 0.2.17
  @backstage/types                                 1.0.2
  @backstage/version-bridge                        1.0.3
Done in 0.77s.

๐Ÿ‘€ Have you spent some time to check if this bug has been raised before?

  • I checked and didnโ€™t find similar issue

๐Ÿข Have you read the Code of Conduct?

Are you willing to submit PR?

Yes I am willing to submit a PR!

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 33
  • Comments: 29 (24 by maintainers)

Most upvoted comments

@rmartine-ias Thanks for the reply, it worked for my problem! Good job compiling all this informations, helped me a lot

Still an issue, still working on solution

@rmartine-ias sorry I seemed to have missed your last ping on this message.

Great work digging into this and coming up with some suggestions as to what we want to do here, really awesome job.

Weโ€™ve actually had a discussion a little bit about these pull requests actions, and are kind of leaning towards in the future just providing some more generic git actions to use instead of these specific ones for each provider. The thinking is that we should be able to re-use more of the generic actions to be able to create branches / forks and push to these sources, and then we have an action to create a PR or Merge Requests using the providers APIs with the base and target branches.

With that said however, thatโ€™s not coming any time soon, and I think we want to explore that a little more before committing to it being something we want, so Iโ€™m happy to proceed with some of your suggestions in the short term to get this working!

Iโ€™d pick something like isbinaryfile for robustness

Letโ€™s do this, seems sane!

Ordinarily this would be handled by .gitattributes, so maybe support for that? This may be out of scope.

I wonder if we can mask up this behaviour into opt in or something for now as you suggested, so that you can test it when it getโ€™s released in anger without breaking anyones workflow and then we can look at options if we want to mimic .gitattributes.

Hope this is OK? ๐Ÿ™