App: [HOLD for payment 2024-02-19] [$500] Create Automation for when main fails in `/App`

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Problem:

We have a Github Action defined in App/.github/workflows/preDeploy.yml that can fail without being noticed. It runs every time a PR is merged into our main branch. When it fails there is no process for notifying engineers of its failure other than an email to the person who merged the PR, which can easily get lost or ignored. Previous examples of this happening and being handled manually are here and here.

Allowing this test suite to break and go unnoticed can make it difficult to diagnose what caused the issue in the first place, and opens us up to performance regressions that could negatively impact the user experience.

Solution:

Create an automation that produces a new Github issue when main fails. It should only create an issue if there isn’t already an open issue for the associated breakage already. The issue should be labelled as daily and should automatically assign both the the author of the PR that contains the commit that kicked off the failed workflow run and the person who merged the PR into main. The issue should also link to the PR that is associated with the failed run.

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~018b564dd37aa88e42
  • Upwork Job ID: 1744445606630244352
  • Last Price Increase: 2024-01-08
  • Automatic offers:
    • jjcoffee | Reviewer | 28091182
    • rayane-djouah | Contributor | 28091183
Issue OwnerCurrent Issue Owner: @muttmuure

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Comments: 29 (19 by maintainers)

Most upvoted comments

Just wanted to say I say this working for the first time in the wild today (link). Nice job!! 🙂

@blimpich, yes, it will create a single issue for the e2ePerformanceTests job failure because both of the failed jobs (Build apk from latest release as a baseline and Build apk from delta ref) are part of the e2ePerformanceTests job in the Process new code merged to main workflow. Do you think we should create separate issues for them?

Should a new issue be created for any job that’s part of the Process new code merged to main workflow failing?

@jjcoffee I think so, yes, though that slack notification does add some extra noise. I didn’t realize we posted to slack when that happened. Still though, I’d like this to be a more general solution instead of a automation that only concerns the E2E performance tests, even though the issues that I link to are about the the E2E tests. The spirit of the issue is that when main breaks, for whatever reason, we should be creating an issue for that breakage.

@muttmuure Friendly bump for payment 🙇

No I like the way it currently works, just confirming 🙂👍

Sorry to say, but the proposal seems incomplete and I think the general guidelines is to share a Plan of action for a fair selection. But thats ok if you started working on a fix. Good luck!

I started to work on it already. The PR will be up soon

I added the bug label so we can get someone from the Bug Zero team to handle payments later

📣 @rayane-djouah 🎉 An offer has been automatically sent to your Upwork account for the Contributor role 🎉 Thanks for contributing to the Expensify app!

Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review 🧑‍💻 Keep in mind: Code of Conduct | Contributing 📖

The solution is pretty vague, but I am hoping you guys will handle it in the PR. Assigning @rayane-djouah

@rayane-djouah’s proposal LGTM! I’m guessing we’ll dig into the details on the PR.

🎀👀🎀 C+ reviewed

Proposal

Please re-state the problem that we are trying to solve in this issue.

The current Process new code merged to main GitHub Action lacks a notification system for workflow failures, risking unnoticed issues and impacting user experience.

What is the root cause of that problem?

The absence of automated alerts delays issue detection as notifications rely on emails, often overlooked or lost.

What changes do you think we should make in order to solve the problem?

Implement a new workflow, failureNotifier.yml, triggered on Process new code merged to main completion. Create a GitHub issue on failure if there is no open failure notifier issue for the failed workflow, labeling it ‘daily’ and assigning it to the author and PR merger.

What alternative solutions did you explore? (Optional)