runner: Please support something like "allow-failure" for a given job

Edit from @vanZeben:

As this is a GitHub Actions platform feature request, discussion for this feature has been moved to https://github.com/orgs/community/discussions/15452, please go and put your support behind that community discussion instead of this issue.


We use github actions in our “snapd” project and we love them.

One small feature we would love to see is a way to mark a test job as “allow-failure” (or a term along these lines) [0]. This would simply mean that the overall /pulls overview page would show the PR as with the little green tick-mark (and maybe in the tooltip 5/6 OK, 1 ignored). It would still show as a failure in the details view (maybe with a different icon?).

Our use-case is that we have some CI environments that fail frequently because of external factors like repository mirrors that are our of sync etc. We still want to run the CI on these systems but not get distracted too much by these out-of-our-control issues.

Hope this makes sense.

Thanks! Michael

[0] E.g.

jobs:
  spread:
    runs-on: ubuntu-latest
    allow-failure: true
    steps:
    - do-some-flaky-stuff

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1338
  • Comments: 157 (4 by maintainers)

Commits related to this issue

Most upvoted comments

I don’t think it is that specific. People have requested this before: https://github.community/t5/GitHub-Actions/continue-on-error-allow-failure-UI-indication/td-p/37033

I just came here via Google as I was surprised I couldn’t find anything like this in the documentation. It’s standard with e.g. Travis CI

Hey @thboop. Thanks for your quick reply!

Yeah, it’s really just about the little green tick at the pull-request overview page. AFAICT when one job (even if it’s not required) fails the overview PR list will show this PR as failed. Having a way to mark certain jobs as not rquired would still show the pulls as green (or yellow?) instead of red.

But I do understand this is a bit of a specific request, so feel free to close it if you think it’s a bit too odd. We had it with our old CI system and I liked it.

Hey @mvo5 Thanks for the feature request!

We do support marking a step to allow failure via continue-on-error.

We also support marking specific checks as required That will allow you to merge pr’s without a particular job succeeding.

I think the latter should solve your issue at the job level, is there a reason that doesn’t well for you?

I guess one way to handle this would be for jobs with an allow-failure flag set to generate a neutral conclusion on failure instead of failure. As I understand it, this would let the overall check suite conclusion be success if such a job fails.

This is one solution

      - name: Install dependencies
        id: composer-run
        continue-on-error: true
        run: composer update --${{ matrix.dependency-version }} --prefer-dist --no-interaction --no-suggest

      - name: Execute tests
        if: steps.composer-run.outcome == 'success' && steps.composer-run.conclusion == 'success'
        run: vendor/bin/phpunit

continue-on-error causes difference in outcome and conclusion!

I thought that jobs.<job_id>.continue-on-error sounded exactly like this feature, but it doesn’t seem to work like that. For me, at least, setting a job to “continue-on-error” still causes a red x for the workflow; it doesn’t seem to have any effect at all.

I tried continue-on-error recently,

  • if used at the step level, then everything appears green and the only way to notice the failure would be to manually open the step. There is also an annotation Check failure on line 1 in .github but that doesn’t link to the proper place. That’s not enough notification, and there seem to be no reasonable way to see if that job actually passed or failed. Example: https://github.com/eregon/sequel/runs/1476292451?check_suite_focus=true#step:7:27 I think it would be far more useful if continue-on-error at the step level would behave more like at the job level, and mark the CI job as failed or failed-but-expected. The failed step itself should have a red or yellow mark, so it can be noticed without having to guess which step failed.
  • if used at the job level, then the job is marked as failed, which at least can be noticed. And the whole workflow is still marked as successful when looking in the Actions tab. But on the PR it’s rather confusing, and it looks like the CI broke when it did not. Example: https://github.com/jeremyevans/sequel/pull/1737/checks?check_run_id=1479678900 Screenshot from 2020-12-04 11-03-30 The big red Some checks were not successful sounds like something broke, but no, an error/failure there is expected so it shouldn’t be marked as “CI failed”. And on the wheel there is a bit of red. The whole area is encircled in yellow (I’m not sure why), which seems OK. So there I would propose to mark the failed job in yellow, and possibly use another symbol. And then to use the same color in the wheel for jobs which are continue-on-error: true. And also change the text so it would be something like All expected checks have passed in green (the wording for all checks are green is All checks have passed). The whole area should then be encircled in green, because it’s safe to merge without looking at logs.

it is quite funny that after hundreds of mentions of this issue in other projects, probably hundreds more in private ones, there’s still no progress. But on a serious note there is nothing more to say than what was said before - this feature is crucial.

Hi @thboop, @kotewar, @pdotl, @rentziass and @luketomlinson,

This issue recently passed 1000 thumbs up. It’s by an order of magnitude the most liked issue on this repository.

Screenshot_803

Therefore I would ask you to consider this issue once more. It would provide significant value to a huge amount of developers. It will improve the testing ecosystem significantly. It could accelerate software adoptions.

If you can’t deliver this feature, that would be unfortunate, but please communicate it clearly. Then this is settled and we can look for other alternatives.

I would love one more reply on this thread - even if it’s a final one.

Folks, this is an issue tracker not a discussion board. The request is formulated at the top. Piling on does not help. But each time you comment, hundreds of people get a notification.

I’m also an Actions user who would love to see this feature. My team’s project has a long build and a few jobs that are flaky and frequently take a while to fix. We do just ignore failures on those for merging PRs, but being able to make the green tick agree with that convention would be really nice.

Hi @Phantsure, @nickfyson, @vsvipul, @joshmgross, @dhadka, @kotewar, @TingluoHuang, @ericsciple and @maxim-lobanov,

I’ve mentioned you, as GitHub employees working on GitHub Actions, in the most upvoted issue on actions/toolkit by an order of magnitude.

Screenshot_803

I did this because in this issue over one thousand users have asked for a feature that allow certain jobs to fail, without the whole CI run failing. Many great ideas for implementations have been suggested.

Unfortunately, on at least on this issue, the toolkit maintainers are completely unresponsive. I know most of you don’t work on the toolkit directly, but I would like to ask you all to bring this to the attention of the GitHub teams and/or management that could help out with this, if you know any of them.

Like the upvotes and response show, this is the single most wanted thing among developers for GitHub Actions. It would be a great addition to the GitHub Actions ecosystem.

Yeah… we don’t want a dirty workaround, we want an officially supported way of doing this from Github like other CI platforms have.

Just one more person trying to migrate from Travis, who thought I could use continue-on-error to make certain jobs allowed failures, only to find that continue-on-error didn’t work like I thought… please add this! As others have mentioned, if an allowed failure fails, it means that that job should get a red X, but the overall workflow should not.

By default, when any job within a workflow fails, all the other jobs are cancelled (unless ‘fail-fast’ is False).continue-on-error specifies that a certain job failing should not trigger cancelling other jobs. (I would definitely love some way to keep a job from signalling a failing check.)

Hi @Phantsure, @nickfyson, @vsvipul, @joshmgross, @dhadka, @kotewar, @TingluoHuang, @ericsciple and @maxim-lobanov,

I’ve mentioned you, as GitHub employees working on GitHub Actions, in the most upvoted issue on actions/toolkit by an order of magnitude.

Screenshot_803

I did this because in this issue over one thousand users have asked for a feature that allow certain jobs to fail, without the whole CI run failing. Many great ideas for implementations have been suggested.

Unfortunately, on at least on this issue, the toolkit maintainers are completely unresponsive. I know most of you don’t work on the toolkit directly, but I would like to ask you all to bring this to the attention of the GitHub teams and/or management that could help out with this, if you know any of them.

Like the upvotes and response show, this is the single most wanted thing among developers for GitHub Actions. It would be a great addition to the GitHub Actions ecosystem.

Hey @EwoutH and others, As you’ve noted, this is a very requested feature and nearly 3 years is a really long time to be waiting for some feedback, beyond the original post, so I completely understand the frustration expressed by some here. Your voices aren’t going unheard and I apologize that it feels that way and I, on a personal level, can completely empathize. We need to do better here because 3 years (after our first response) really is unacceptable.

To address this feature request specifically, the long short of it is that the appropriate people need to see this feature request and see that it matters to the community, and the best place for that interaction to happen, is within our community discussions. I personally won’t speak to what our actual decision is on whether or not this is something we intend on doing (because I don’t know), but I have brought it up internally to get some response from the appropriate people.

Furthermore, I want to say that this specific repository is designed to be sort of a series of convenience functions that make writing other individual actions easier, rather than a forum to suggest general GitHub Actions changes. I’ll make a point to update some of our contributing guidelines/documentation on this repo to reflect that in the new year. This issue should have been redirected to the community discussions a long time ago so that it could be talked about and addressed appropriately. Fortunately there is a post there from earlier this year (Thank you to @eregon for making that 🙏🏻 ) that could all the insight and push from the community that we have in this thread. Please go over there and put your name behind the request as that’s really the best place for this discussion to happen.

In the meantime, I will close this issue and encourage further discussion in the appropriate forum. Thank you for your patience and understanding.

Link for the community feature request: https://github.com/orgs/community/discussions/15452

@ktomk We are now over 1000 likes on this issue… it says here in the Necronomicon that the next step is to make a sacrifice to Cthulu (the Octocat of R’yleh :octocat:) and then a rift will open in Github Actions from whence this feature shall emerge. Am I reading this correctly?

We’re just 5 short of one thousand thumbs up on this request! 🚀

Here’s another usecase: In about 2 weeks Python 3.11 will be release, and simultaneous the first Python 3.12 alpha. With an allow-failure tag, a project can immediately add Python 3.12 to their job matrices, and catch errors as they get introduced, without failing the whole CI. This prevents them from piling up during Python 3.12 development.

On a failed job with allow-failure=true, a yellow or orange warning triangle could show up, instead of the regular green cross. The total CI results should still be green.

Fail Pass
Normal ✔️
With allow-failure ⚠️ ✔️

Just chiming in with another +1 to this; I have a number of workflows where I test on stable + nightly, or stable + latest, and I want to test latest/nightly, but don’t want the job to fail and still be marked as a pass in the UI if that test fails.

I still want to be able to get into the test results to see what fails, but it would be nice if the UI would show a grey dot, or an exclamation point, or anything besides a green check for the failed Job.

As someone who is migrating a number of open source projects over from Travis CI, this is probably the number one painful omission.

Where is this feature over 1.5 years after filing the issue. 😞

In our case we added a workflow from important downstream users to our CI to get notified if we break their setup. We are by far not the most likely source of breakage there though and quite often all our commits show red CI results for unrelated reasons.

Github directly supports the option to select whether or not a job has to pass to allow maintainers to merge. But if these maintainers only see red crosses in the PR overview they are likely to miss the merge. I would be equally happy with a solution that indicates “merge allowed” or “critical CI failure” in the PR overview.

@thboop any update? “continue on error” isn’t “allow failure”, unfortunately - as has been explained upthread, “allow failure” means, mark the overarching workflow as successful even if this job or step fails.

Thanks @vanZeben.

I’m not personally frustrated, but I can understand those that are. I would say, it’s more like, disappointment. I don’t know who speaks for GitHub Actions, nor have I had any traction reaching out to support. It actually feels like the entire team behind GitHub Actions, at least the ones which used to communicate, have simply vanished.

It would go a long way to restore community faith in the process to establish better process and communication channels. This means that developers on the team for GitHub Actions actually need either a community manager or need time allocated for triaging community requests. GitHub has a whole host of tools for doing this (e.g. projects) but it’s not clear you are using these in a way that encourages community involvement. Is there actually a publicly facing repository where such an issue would be better placed? I imagine if this code was open source you’d have some very nice PRs to implement the required feature.

Finally, I know you are being diplomatic but I wouldn’t say this is an optional feature. I wouldn’t be surprised if this was literally the number 1 requested feature from the community. It speaks volumes that such an issue is basically ignored for several years, and I think you’d find that a lot of the engineers on this thread go back to their respective teams and companies and recommend alternatives like BuildKite - and not even because this feature is not implemented, but because it shows a lack of commitment and engagement to the needs of the engineers - which I truly feel is not the case - but that’s the public image portrayed by this issue and how it’s been handled.

Anyway, thanks for doing your best to triage this issue - it sounds like you are at best working orthogonality to the GitHub Actions team. I would personally like to see them pulled into this issue and/or community discussion.

We definitely need a way to ignore failures, rather than using continue-on-error and making everything look green. We should be able to visually see the errors, but not have it fail the build.

Simple use-case here, I want to monitor Python release-candidates and forward compatibility. Just to provide a warning for the roadmap if anything fails. The red failure is a little depressing after a period of time, and not true to the intention of the workflow.

+1 here for the feature.

Wouldn’t https://github.com/actions/runner/issues be a great place to move this issue and re-open it? Could you transfer this issue there @vanZeben? After all I would expect most changes needed for this feature would be in the runner. An issue has many advantages as other have said including backlinks, etc.

It feels very unlikely to me and others that any GitHub engineer working in the area would ever reply (with an actual answer discussing the details or a decision whether it will be implemented) to https://github.com/orgs/community/discussions/15452, but I would be delighted to be proven wrong.

As I mentioned earlier, this is a feature request that spans a few different repos, both private and public, hence why this should be in the community discussions. You are right that parts of the code change would be in actions/runner, so that repo would definitely be more appropriate than actions/toolkit being the only open source repo for this sort of thing, but since it’s a platform feature request, keeping the main discussion in community discussions is ideal.

I do understand that there is some bad rep and whatnot, so I’ll move this issue to the actions/runner repo, keep it closed and put a backlink to the discussion in the original issue, so people can link directly to this issue if they need to, but the discussion thread should be maintained for the actual discussion. Does that work as a good middle ground?


I also want to make a big note of saying that the status quo for how things have (or rather have not) been used, doesn’t mean that it’s the ideal. We should as engineers & product in the GitHub Actions system be responding and communicating with the community better, and as I’ve said a few times, are taking steps towards that goal. Step one being triaging some of our very old stale issue that desperately need it. Hence why I am responding to this entire issue at all because personally the reputation that we have “abandoned” our OSS actions doesn’t sit well with me.

As has been said before, this is a big ticket item that the community wants, and while I won’t be able to get much feedback on this feature presently, as it’s holidays, I, whilst being a simple IC here myself, am taking charge of getting some appropriate feedback on this. So, it might not be an engineer who responds, might be product or myself, but it’s imperative to me that we change things and prove you wrong 😛

Hey all, just passing by to say that this is the second or third time that I found this issue while searching for an ‘allow-failure-like option’. I would really love to have this around!

I’d like to +1 this. At Discourse, we’re moving all our repos to Github Actions, and one of our gems (rails_multisite) is tested against multiple Rails versions and Rails master. Tests against Rails master are informative and are allowed to fail. So it’d be really nice if they didn’t cause the overall workflow to display a scary ❌ if they failed.

I recently revisited this when I noticed the jobs.<job_id>.continue-on-error option in the docs. Couldn’t recall if I’d tried it before. I believe at some point in time, it may have not been available? Anyway, it is helpful, but still lacking. Forgive me, but below I’ve described why I think a proper ‘allow-failure’ is important.

There is one ‘top level’ pass/fail indicator that is important, and that is the indicator associated with the push/pull-request event that triggered the CI run. For a maintainer/owner reviewing PR’s, this indicates whether they need to look at the CI logs. For a contributor, who may not understand all the Action workflow syntax, it shows whether their PR passed CI. Currently, the option does not affect this indicator.

It does affect the workflow’s indicator in the list of workflow runs shown on the repo’s ‘Actions’ page/tab. It also can affect badges, since they can be tied to a specific workflow file.

So, the above option does not affect the most important indicator. It seems to be based on jobs, and ignores the containership of jobs within the workflow file, which is what the option affects.

Finally, it’s been mentioned in an above comment that ‘allow-failure’ functionality is desired for testing against ‘master/main’ branches of major dependencies, which may include the language used. This can be an important feedback loop, and the current functionality of jobs.<job_id>.continue-on-error works well for that, as clicking the ‘top level’ pass/fail indicator in the web UI will show what jobs have failed, even if they are set to ‘allow-failure’ by using jobs.<job_id>.continue-on-error.

in the interim, what if there was a step that simply posted the “allowed failures” in a comment (that same comment updated each time the workflow ran). Similar CodeCov.

according to the docs, this is a detectable condition:

https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#steps-context

steps.<step id>.outcome string The result of a completed step before continue-on-error is applied. Possible values are success, failure, cancelled, or skipped. When a continue-on-error step fails, the outcome is failure, but the final conclusion is success.
steps.<step id>.conclusion string The result of a completed step after continue-on-error is applied. Possible values are success, failure, cancelled, or skipped. When a continue-on-error step fails, the outcome is failure, but the final conclusion is success.

Another use case is to be able to test future versions of Python to ensure compatibility for our project. This gives us a heads up of what to expect and deal with issues as they come up, and keeps a history.

@vanZeben closing a highly rated issue to point to a newer (also highly rated discussion) before the holiday season just comes off as you trying to juice ticket metrics, rather than actually responding to a long running community request. It would be better for GitHub to leave this open and continue to ignore it rather than just arbitrarily close the issue.

// cc @martinwoodward

Please use the “thumbs up” reaction at the top of the issue. Adding more comments to just say “I also need this” does not help, and notifies everyone watching this thread for updates, so please don’t.

@glensc No. The job status should be “failed” if it has errors, but the status for the whole workflow should ignore the status of the said job. I do want to know if my project works with latest development dependencies or not, especially when the new upstream version is about to release. Manually reading the logs works but takes more effort than Travis CI’s.

Well if all you want it is not blocking progress in your (non-incremental) build, it’s easy to do with Github Actions, don’t see where your issue is. @ljharb

allow-failure is really necessary for nightly builds. Please support this feature…

+1👍 for adding an allow-failure option. Semantically this is different than continue-on-error. In practice allow-failure is most useful for testing against experimental versions of languages/frameworks (e.g. ruby-head) where you’d like to be able to see failures in the CI run as red (perhaps marked with an “ignored” sub-icon), but at the same time such failures should not cause the overall CI status to become red.

You can see the an example of the pointless debates this missing feature causes here: https://github.com/collectiveidea/delayed_job/pull/1162

It’s not a failure you want to hide, it’s a failure you don’t want to block progress, which is a perfectly reasonable thing to want that remains entirely unrelated to incremental builds.

Folks, if you’re looking for incremental builds, well then just do incremental builds. Github Actions is on a clean revision from your git(1) repository and if you can’t use command-line, well, I feel for you, but my suggestion is that you shift left a bit more. Never give up! (Even you can make it [incremental or at once]!)

Travis CI has this as their allow_failures feature:

https://docs.travis-ci.com/user/customizing-the-build/#jobs-that-are-allowed-to-fail

This can be useful to do testing on unreleased versions of upstream products (beta-testing against their unreleased code) so you can get an idea of when stuff starts failing upstream.

We’ve used this for many years to test code against ruby-head. None of these tests are required for “production” development and deployment but it can be highly useful to have the canary running all the time. The results should be allowed to go red, but be completely ignored for the overall result of the tests.

The documentation on GH actions continue-on-error method even looks like it trying to do exactly this, but the result is very much not what we want. It almost looks like product and/or development tried to copy the feature but misunderstood what its use case was or just misunderstood the semantics.

All mature CI-s

Of which GHA is by far the youngest. It’s also build on a simple, composable design instead of hard-coding things like workflows for a specific tool that may change (looking at you, Travis CI!).

I think we are getting one thing at a time. allow-failure, as well as single/failing job restarting, skip-ci, and probably other things may come in the future. I don’t don’t think eloquent arguments are needed, just time to prioritize, design, test, and deploy. We have already been getting some great features, like manual workflow triggering.

not allowing failed builds fails to integrate with next versions which are normally allowed to fail but constantly checked. was easy on travis-ci, is not possible with github actions. the docs about migrating from travis-ci do not cover this as well. just my 2 cents.

Hey folks 👋

I’ve been wanting this feature for a very long time 😫 in the Ember ecosystem we tend to have test “scenarios” automatically generated that from time to time are expected to fail so we have really been feeling the pinch with this missing feature.

So I invented something to try to fix it 🎉 https://github.com/mainmatter/continue-on-error-comment

It’s very new but we’re already using it in a client project. I know it’s not perfect but I think it’s a reasonable stop-gap while we wait for this to be fixed officially by GitHub 👍

@ktomk no, it’s not. allow failure can apply to all sorts of use cases, including “i no longer support version X of $platform but i still want to learn when it started failing”. It has nothing to do with dependencies, runtime or otherwise, nor incremental builds.

There’s a reason every single CI system in existence, except for GHA, supports “allow failure”, and it’s not “incremental builds”.

I updated the feedback with links to AppVeyor and Travis CI documentation which includes concrete use cases, and proves other CI systems do this for a reason.

@thboop can you comment if Github Actions is planning to join the league of sane CI systems and add this feature, or if this is an intentional perverse design decision to omit this feature?

My solution to this is that the line to run my tests is:

run: bundle exec rspec || ${{ matrix.experimental }}

so the result is true if the tests pass and if they fail it will be true or false depending on the value of matrix.experimental.

As a work-around this is OK but I do not consider this to be an acceptable solution because;

  • I am running this with current development version of Ruby because I want to get advanced warning of any issues but any failures are now hidden behind a green tick.
  • I should not need to hack how I run my tests to get round a deficiency of the framework.

A dirty workaround for bash commands which fail sometimes is just to use the bash ‘or’ with another command that will succeed. eg myFlakeyScript.sh || true

Do something like || echo "::warning::Job failure ignored!" or you risk of breaking it even further w/o knowing. That is why so many people ask for an orange build status, they know very well that green/red have bad consequences in long term.

Actual allow-failure is still desirable.

Well, we all know how helpful examples are…

I created a commit in a fork that has two workflow files, one of which I added jobs.<job_id>.continue-on-error to a single job.

Lets look at the ‘top level’ indicators, see the commit ‘Actions - allow failure - non_mri.yml’:

https://github.com/MSP-Greg/puma/tree/allow-failure https://github.com/MSP-Greg/puma/commits/allow-failure

Both ‘top level’ indicators show failed, which is what’s wrong with the current implementation.

When either are clicked, they show one job failing, which is what I’d prefer. If one has prior knowledge of what jobs run ‘allow-failure’, it’s very easy to check.

Now, let’s look at the ‘Actions’ page run list.

https://github.com/MSP-Greg/puma/actions?query=branch%3Aallow-failure

The two workflow indicators (‘MRI’ & ‘non_MRI’) for the ‘Actions - allow failure - non_mri.yml’ commit both show passed. This is inconsistent with the indicator mentioned above for the commit.

@ktomk i’ve already done so; but your unwarranted, incorrect, and frankly hostile comment needed addressing. Please be more respectful in the future when sealioning into threads, thanks.

(and yes, clearly you haven’t read the actual discussion or you’d have understood that it wasn’t related to incremental builds in the first place)

That’s a fair point, but let me suggest you assume already it will become a failure you want to hide which makes no sense for CD. So much for the increment in a build. In any case the discussion about it is here: https://github.com/orgs/community/discussions/15452 (yes I also miss this feature as a previous Travis userertte, but the results are much better compressed there, this is just tracking the issue - and Microsoft) @ljharb

When you compare your build result against the interchanges of the interwebs of your dependencies (Github action or otherwise runtime at least) where you allow failure, this is an increment where you plan ahead. You put it on failure because you know your pipeline is not yet green/red but you wait for it. This is totally related to incremental builds or the build of an increment, it’s just that you have a runtime dependency within the Gihub action @ljharb

@henryiii given that every CI system but actions seems to have “allow failure” but does not have “continue on error”, it’s at least a bug in the minds of those who shipped the latter in response to requests for the former.

@ktomk

I’m a maintainer of a repo using a common technique for showing a failed job in the summary, but is still not as friendly as seeing it in the job list shown in a PR or the repo’s Actions tab.

So, given multiple issues in multiple repos, and also probably a few in the Support Community forum, this is a feature many people have asked for. Or, demand for it shows that people aren’t interested in the available workarounds…

For about the 20th time in the past year, as I continue migrating my Travis CI jobs over to GitHub Actions, I’ve been bitten by this weird behavior again. I really wish there was something like Travis’ ‘allow-failure’, where it could still show up as failed in the UI, but would not fail the entire build.

Something like this would be extremely useful for testing packages that rely on other packages to be updated first. For example, if one were to check if their Python package is Python 3.10 compatible (and all of its dependencies as well), one could test their package for Python 3.10 and mark it as allowed-to-fail until it finally stops failing. This way, every time the test suite is ran, it basically checks if all dependencies are Python 3.10 compatible, but it won’t fail the job if that has not happened yet.

@geerlingguy

Maybe we’ve got a terminology issue, but in Travis, a job that’s marked as ‘allow-failure’ is shown as failed. Conversely, the ‘top level’ indicator is shown as passed.

That’s part of the reason for my post, as I wanted to describe exactly what I thought was needed. Also, Actions has another level of containership in part of the UI, the workflow file. In Travis and AppVeyor, that didn’t exist.

And, in Actions, that containership is only shown in the repo 'Actions` page, it doesn’t exist when looking at commits or pull requests that show a ‘top level’ indicator…

EDIT: Rephrasing, I would like the correct status of all jobs to be reported anywhere a job list exists in the UI. Any pass/fail indicators summarizing that list, in whatever way, should be aware of the ‘allow-failure’ settings of jobs, and ignore their result. Hence, those indicators will show pass/fail based only on jobs not marked as ‘allow-failure’…

@ramsey your issue doesn’t seem related to the thread. Your job is failing because your include doesn’t have a value for operating-system

Wouldn’t https://github.com/actions/runner/issues be a great place to move this issue and re-open it? Could you transfer this issue there @vanZeben? After all I would expect most changes needed for this feature would be in the runner. An issue has many advantages as other have said including backlinks, etc.

It feels very unlikely to me and others that any GitHub engineer working in the area would ever reply (with an actual answer discussing the details or a decision whether it will be implemented) to https://github.com/orgs/community/discussions/15452, but I would be delighted to be proven wrong.

I do not know how to communicate with this platform and don’t want to know!

Sent from Yahoo Mail on Android

On Wed, Mar 23, 2022 at 1:49 AM, Mathieu @.***> wrote:

I also need this…

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

“Allow failures” as an enabler of incremental improvement

I have a one-person project [1] , so PR checks are not an issue for me. However, as with most in this ticket, I am need of “allow failures”. Let me add my use case to the discussion as I think it is slightly different from the ones described so far, but probably a very common one.

At present in my project we do not support clang-cl (a c++ compiler for windows). The clang-cl build fails to link due to some weird link error, and I haven’t had time to figure it out. If I add clang-cl to the build matrix, all my commits get marked with a red X, i.e. failed. This, even though the build was marked as experimental [2]. As a consequence, I need to check each and every commit in detail, in case some other build is broken. I could of course remove the build altogether (which is my workaround at present, see commit above). This is less than ideal because I’d rather know if a particular commit makes that build worse, or, in case I get lucky, better. So it would be nice to continue to build clang-cl in my build matrix, without impacting the overall red/green status of each commit.

In fact, by not having this feature, it makes incremental improvements to the build much harder. I now need to enable the build, deal with any rotting that may have happened since the last time it was enabled, make some changes, run the build and then disable it again. The drive-by approach I used to take - whenever I got 20 minutes here and there, do a quick experiment to see if it improved things - has now been rendered unworkable because most of those twenty minutes would be spent on overhead.

All of this to say that, for me, this is an important feature.

Links

[1] https://github.com/MASD-Project/dogen [2] https://github.com/MASD-Project/dogen/blob/66022e278892f2fedfa44df2b56cb2741e01a28c/.github/workflows/continuous-windows.yml#L30

I recently implemented nightly builds for PHP 8.1 in Laravel and used continue-on-error like @thboop suggested. It works pretty well when you combine it with a matrix build. I think the only thing that’s missing is indeed something like a different color indication for builds that continued on error. Something like a yellow status and a different icon.

I’ve been using continue-on-error and it mostly provides the functionality that I’ve used travis’ allow_failures for.

But there are some UI inconsistencies: the /:user/:repo/actions page gives a nice ✔️ and the badge correctly says “passing”, but the /:user/:repo/commits/:branch page shows ❌.

The problem is that we - CI users - have to justify why allow-failure is a vital option.

@ljharb Then please take my impatient reply earlier as impatient and I honor your patience to equally remain silent on this issue here. Thanks.

I also need this…

I’m currently using this:

      - name: Something
        run: something-that-might-fail
        continue-on-error: true
      - name: Allow failures
        run: true

Not good as a warning (yellow) sign but better than having it as a failure (red).

@vanZeben is there any progress internally on this issue? You’ve said for people to go and put their name on the discussion to make progress. What’s the magic number of people who have to put their name behind this issue in order to make progress on it internally? There are 1000+ people on this issue who have expressed desire for this feature… Is that not enough?

Hey @ioquatix, thanks for the question!

I definitely understand the frustration behind these sentiments and objectively it’s very clear that this is a very requested feature from the community who deserves, at the very least, a response of “we will/will not support this and this is why”. It’s my personal opinion, and I’m being very candid here, that on the Actions side of GitHub, our standards for being good stewards of open source are lacking and need to improve and we are making moves in this area.

Part of the problem is that we have been very committed to improving GitHub Actions and put a lot of resources behind that initiative, and a lot of the systems that make up that platform are closed sourced so the open source side of things that the community has visibility into, has taken a bit of a backseat in order for us to improve GitHub Actions to a level we want it to be.


As for the stance on community issues and your concerns around that:

For the sake of being completely transparent, I want to make an explicit point of saying that community support for a feature doesn’t directly mean that that feature makes sense/GitHub wants to support that feature. We have long term Product plans for the Actions ecosystem, and some community features might directly conflict, or make our systems more difficult to meld together. The community definitely has a stake in that conversation, but ultimately it’s up to if the communities vision and our vision can exist harmoniously.

That being said, I can’t, and am not trying to, speak to whether or not this feature will be picked up and supported by GitHub Actions, nor what would quantify some “magic number” to show that the community really wants this feature. My former message is solely that this specific Open Source utility based repository (actions/toolkit) is not the home for GitHub Actions platform decisions, so in order to have the appropriate visibility of the stakeholders responsible for those decisions, the community support needs to be placed where it will be seen by those parties.

Issues/feature requests in this repository should be primarily for improving/bugfixing the toolkit suite of packages.

@ktomk I suggest reviewing the original post, which describes the issue.

@ktomk this feature has nothing inherently to do with incremental builds, so your comment is confusing.

Much-needed feature requirement

+1 would like to see the option to enable Warnings only for a failed step or ignore failure.

@Danon continue-on-error can do that, but this issue is about “allow-failure” which would show the overarching status as green even if 8.0 failed.

It is in large parts an issue of presenting continue-on-error job and steps. No indication at all for a continue-on-error step means one cannot even find out if a step failed without expanding each step in the log of every build and analyze the output, which of course nobody does or has time for. A green job should mean “every step succeeded”. So the step-level continue-on-error seems no better than doing || true which is of course terrible for logging and finding issues.

The job-level continue-on-error has kind of the opposite issue where the PR looks like it breaks CI (shown as red with failing builds) even though the only job failing is marked as continue-on-error in the workflow and so it is an expected failure.

I think my comment https://github.com/actions/runner/issues/2347 captures clearly what needs to happen here to make this feature actually usable beyond just || true.

Now the question is how do we get the GitHub Actions dev team to look at this? cc @chrispat @thboop

(probably this should be an actions/runner issue but that is details)

@glensc right, the point is that the experimental platform when failing should be marked as failed, but the overarching pipeline should be marked as successful. The last part is what Github Actions’ “continue on error” fails to do (travis-ci’s allow_failure does both)

Not only does continue-on-error not work as expected (as mentioned by everyone on this thread), but the example in the documentation appears to result in a syntax error that I cannot figure out how to get around. As a result, I’ve been trying all sorts of permutations to figure out if this syntax error is causing continue-on-error not to work as expected.

Maybe it is?

Here’s the documentation example verbatim:

runs-on: ${{ matrix.os }}
continue-on-error: ${{ matrix.experimental }}
strategy:
  fail-fast: false
  matrix:
    node: [11, 12]
    os: [macos-latest, ubuntu-18.04]
    experimental: [false]
    include:
      - node: 13
        os: ubuntu-18.04
        experimental: true

My version looks similar:

unit-tests:
  name: "Unit Tests"

  runs-on: ${{ matrix.operating-system }}
  continue-on-error: ${{ matrix.experimental }}

  strategy:
    fail-fast: false
    matrix:
      dependencies:
        - "lowest"
        - "highest"
      php-version:
        - "7.4"
        - "8.0"
      operating-system:
        - "ubuntu-latest"
      experimental: [false]
      include:
        - php: "8.0"
          composer-options: "--ignore-platform-reqs"
          experimental: true

When this runs, it fails at the unit-tests job, due to the following error. (See the failure here.)

Error when evaluating 'runs-on' for job 'unit-tests'. (Line: 121, Col: 14): Unexpected value ''

Somehow, it’s getting an empty string for runs-on, or at least, that’s what this error seems to indicate to me.

When I remove the line that defines experimental: [false] (and only this line), the build runs successfully (see here).

So, maybe the successful build is not actually taking into account the continue-on-error property? Does this syntax error help point to some kind of culprit that could resolve this issue for everyone on this thread?

Hey @ioquatix, Thank you for the really well thought out sentiment! I completely agree with everything you’ve said.

In regards to the vanishing, I mentioned that above, most of our work the last few while has been in our closed sourced systems. We do have a list of public initiatives so that the community can see, at a high level, what we are working on. So it completely makes sense that the community feels abandoned given that most of our work has happened outside of their purview. It’s not due to a lack of willingness or desire from the teams, just a change in priority while we improve other areas of the GitHub Actions platform. Also, to be clear, that’s not me attempting to make an excuse for this, just attempting to give some context as to what’s happened.

I also really agree with your points about better process and communication channels between GitHub Actions and the community. There are lots of ways that we can improve this here and help improve involvement and we are presently planning out how we can improve that, but the first step is to make some actual progress with all the issues that have gone stale so it’s not just empty words/promises from us. Hence why I am here talking to you now. And for transparency, there is support from our product and engineering management to improve this as well.

In regards to the code being open source, that’s a hard ask. The closed source parts of our systems own various different other systems that power GitHub that we don’t want in the wild. As for a public facing repository for such issues it really depends on what you have a request about. If it pertains to something in our public repos,, then that’s certainly good to just put an issue directly on that repo. But for things that are overall features for the GitHub Actions platform, right now, those should go to the community discussions.

As for the optional feature comment. I completely agree with you about this entirely. Thank you for writing that up and attempting to let us know how it seems. It’s definitely something that we have noticed ourselves and as I mentioned earlier, and planning to fix because you’re entirely right. Without the community engagement and involvement, the other services that are further separated from your code will be suggested for use. It’s the first thing you do when you put together a proposal for a new product; take a look at it’s support and what the SLA from the owner is if there is any.

Anyways, I really appreciate you reaching out and being very thoughtful and thorough with your comments. Happy Holidays 😃

@jdrusso I suggest you join the discussion and drop commenting on this issue /cc @ljharb, @ross-spencer, @EwoutH, @johnnyshields, @christianfrstorm & @osamagudangada

I created a separate feedback discussion about this, which really focuses on the heart of this issue, and what needs to be done to fix it: https://github.com/github/feedback/discussions/15452 (https://github.com/github/feedback/discussions/9875 seems rather unspecific)

@pboling @ktomk i think i’ve got this nailed now. Below is a brief example. I have a job with continue-on-error: true that also defines an output called “outcome”. This gets populated (via an environment variable) by an if: failure() step.

A “conclusion” job running if: always() has a single step checking the test job output for ‘failure’, and if true runs exit 1;

I tried to bypass the env, but couldn’t find a neat way to do so. I think env must be the only thing that persists across matrix runs within a job.

I hope this makes sense. I’m afraid my full example is within a private repo so I cannot share it.

jobs:
  test:
    runs-on: ubuntu-latest
    continue-on-error: true
    outputs:
      outcome: ${{ env.outcome }}
    strategy:
      fail-fast: false
      max-parallel: 1
      matrix:
        version: [ 'v1' ]
        browser: [ 'bs_mac_chrome98', 'bs_mac_firefox96', 'bs_mac_safari15', 'bs_win_ie11', 'bs_iphoneX' ]
      steps:
        - id: test
          run: echo "your test here" && exit 1;
        - id: conclusion
          if: failure()
          run: echo "outcome=failure" >> $GITHUB_ENV

  conclusion:
    runs-on: ubuntu-latest
    needs: test
    if: always()

    steps:
      - if: ${{ needs.test.outputs.outcome == 'failure' }}
        run: exit 1

@JCMais: Yes, that can be easy to miss, very true in this longer issues’ story.

A better reference might be jobs.<job_id>.steps[*].continue-on-error or jobs.<job_id>.continue-on-error that shows continue-on-error is within workflow syntax (not action metadata syntax or otherwise related to composite actions syntax).

In summary, continue-on-error is not allowed inside composite actions as there is no such syntax. continue-on-error applies to all actions, including composite actions, and is to be added to any workflow jobs or job steps (e.g. that one with the composite action).

Similar as with the run .travis.yml action it should be relatively straight forward to create some allow-failure (or similar) property for it with action inputs. All that needs to be done is setting the exit code of the action appropriately. Further noteworthy are perhaps the conclusion and outcome outputs, these relate to steps.<step id>.conclusion and steps.<step id>.outcome (see Contexts) and might help to better mitigate a missing allow-failure on workflow steps and jobs.

@Suor: You can write a .travis.yml and use the GitHub Action for it, it has support for the allow-failure feature. It basically works with variables and conditions, it should be possible to encode it as well within the ghaction yaml files.

@eregon: please see steps context and Setting an Error Message, these are visible in the summary, no need to check each log.

FWIW I filed a GitHub Support issue about this, if you also want this to progress it might be a good idea to tell them as well.

Thank you for contacting GitHub Support and providing detailed feedback. I’ve documented and passed on your specific request to the relevant team as such, your feedback is in the right hands. While I don’t have a timeline to share for possible changes, we really appreciate you writing in with this detailed feedback helping us make GitHub work better for you! Please let us know if you need any further help and we will be happy to assist.

Concrete suggestions how to fix this: https://github.com/actions/runner/issues/2347

if this branch protection setting could be an exclude list instead of having to specify all dozens of checks to be required and keeping it up to date, it would make things much easier.

but being able to control it via the .yml file would obviously be much easier to maintain long term.

image

Ah, great, I did not know about that public board. Thanks for the link. Maybe you can add a github.com/actions/.github/readme.md file with the links and details.

Regarding open source, I can appreciate it’s a hard ask. But it’s the only way the community can help solve their own problems without direct effort from GitHub engineers.

@pboling https://github.com/orgs/community/discussions/15452 is a different feature request than this issue.

  • That one: Make continue-on-error more visible in UI
  • This one: Allow a job to fail, without continuing, but do not mark the overall run as failed.

@ross-spencer We are attempting to consolidate this type of feedback at https://github.com/github/feedback/discussions/15452 to make it more likely that Github will see and act on it. They are obviously ignoring this issue (and probably ignoring the discussion as well, but at least we can incriminate them by posting in both places, leaving them no excuse).

Would you be able to report your use case there?

Something id like to see is to mark a workflow as failed when the build succeeded but the built binaries still has fatal errors

@jrmhaig: yes, it’s not acceptable, only as a workaround. just a hint: you’ll see || true or similar in logs and at that time it might not be clear what this is about. So if I may suggest to make it self-documenting in the logs:

run: bundle exec rspec || (: allow-failure: ${{ matrix.experimental }} && ${{ matrix.experimental }})

or similar.

if you need inline verbosity (for debug or other reasons), a variant is w/ set -x;, which might not be necessary but sometimes can be useful in logs to appear after the test command output as well.

run: bundle exec rspec || (set -x; : allow-failure: ${{ matrix.experimental }} && ${{ matrix.experimental }})

just FYI.

@albertomercurio Replace incude with include?

    incude:
      - version: 'nightly'
        os: ubuntu-latest
        arch: x64
        experimental: true

Yes it worked!

It’s continue-on-error, not pass-on-error or similar. I admit the latter would be very useful, and is what this issue is asking for, but claiming it’s a bug that continue-on-error doesn’t do exactly what it says it does (continue until the job is complete instead of instantly stopping it when an error is encountered) isn’t helpful. It does not change the fail to a pass, it just continues instead of instantly stopping.

@ktomk the problem is that you both want to have it reported (so you know when it stops failing, and you can promote that new version to “fully supported” status) and have no impact on the PR CI passing.

No, the problem is that those who like a different service better don’t want to pay it and jump over to GHA and then complain that it’s not the same.

Take ktomk/run-travis-yml as an example: It does highlight the allow-failure jobs as errors/warnings (and with the reason) and let the CI pass green as those failures ain’t any. It’s not that it wouldn’t be possible to achieve the same (CI passing even while build job failures), you still have to check for errors thought. Add some “report to the PR” or send out an Email, this all looks possible to me.

However if the decoration would be available as an option, then it would be more integrated. That’s all I wanted express with my earlier comment. With the best intentions.

The main problem for continue-on-error at job level is a PR will be shown as having “failed builds” even though the builds that failed are all continue-on-error jobs: https://github.com/actions/runner/issues/2347

For continue-on-error on a step there is 0 indication the step or the job failed, which makes it a sure way to never see any issue…

Here’s my simple version of the experimental: true behaving like Travis allow-failure: true behavior. If experimental is true, the failing job status is ignored, marked as “passed”.

...
    strategy:
      matrix:
        experimental:
          - false
        php:
          - "5.3"
          - "5.4"
...
        include:
          - php: "8.0"
            experimental: true
...
    steps:
...

      - name: "Run PHPUnit tests (Experimental: ${{ matrix.experimental }})"
        env:
          FAILURE_ACTION: "${{ matrix.experimental == true }}"
        run: vendor/bin/phpunit --verbose || $FAILURE_ACTION

@dersimn Instead of making check-job fail when no update is needed, you can add an output to report the result of the check without failing the job:

  check-job:
    outputs:
      needs-updating: ${{ steps.check.outputs.needs-updating }}

Then remove the Check result step and use if: needs.check-job.outputs.needs-updating != 'false' in the trigger-build job; this way you won’t have failures when no updating is needed (the trigger-build job would just be marked as skipped).

I have somewhat a working version… using the example previously mentioned on this thread with matrix.experimental

I am specifying explicitely experimental: false for any build I want to be green, true for those I can ignore.

https://github.com/rubysherpas/paranoia/pull/537

the whole build reports as success, but the sub-build that fails still cause a red ❌ on the PR

image

Whatever religion or else you apply to @johnnyshields it is in any case totally acceptable to me under the circumstances that this is a much wanted feature, regardless which ones own magic numbers. For me the reasoning is in the past, less in present and much likely not in the future (lets double cross fingers that at least backwards compatibility won’t get broken in progressing towards resulting in the need to maintain too many files across oh so many countless repositories which all allow failure on it - just imagine if!).

@EwoutH I like that use case! 😉 https://github.com/actions/runner/issues/2347

I like the graphics more though, something close to anything like this would be massively useful! I feel that’s pretty clear for new contributors too where it might not be clear where warnings come from (like currently).

Ah sorry i make too many typos. Yes, the two are not mutually exclusive.

@pboling I commented on your last link with use cases for both allow-failures and continue-on-error; the two are not mutually exclusive and are infact complementary.

But we’re missing the forest for the trees here. Both of these features have precedent in Travis, Semaphore CI, etc. etc. and we/Github should stop overthinking and just go with the herd on this one.

– All the best Christian Fr stormyhr Friday, 07 October 2022, 09:58pm +02:00 from Peter Boling @.*** :

@.*** You could see it that way. I don’t. They both have long discussions of ways to fix Github’s very broken CI system, which currently is causing bugs in software releases. @.*** who has been very active on this thread (see here , here , here , here ), created the other discussion because this issue isn’t being taken seriously, or is being misunderstood, by Github.

True to form, they (Github) continue to fail to address the issue in the Discussion forum. The issue is big, and multi-faceted, but they both intend to garner attention to this large issue. Since we aren’t inside Github all we can do is describe how this problem is making our work difficult, and that can make it seem like separate issues (or we can leave the Github platform, as I am). If Github ever does address this I expect it will turn into dozens of stories on their issue tracker. — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread. Message ID: @ github . com>

@albertomercurio Replace incude with include?

    incude:
      - version: 'nightly'
        os: ubuntu-latest
        arch: x64
        experimental: true

Hello,

I followed the documentation, but it doesn’t work. I need to repeate the process for three different versions (1.6, 1.7, and nightly), with the continue-on-error condition for the nightly version. But only the first two appears on my repository! Do you know why?

This is a part of my yaml file:

jobs:
  test:
    name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }}
    runs-on: ${{ matrix.os }}
    continue-on-error: ${{ matrix.experimental }}
    strategy:
      fail-fast: false
      matrix:
        version: ['1.6', '1.7']
        os: [ubuntu-latest]
        arch: [x64]
        experimental: [false]
        incude:
          - version: 'nightly'
            os: ubuntu-latest
            arch: x64
            experimental: true
    steps:
      - uses: actions/checkout@v2
      - uses: julia-actions/setup-julia@v1
        with:
          version: ${{ matrix.version }}
          arch: ${{ matrix.arch }}
      - uses: julia-actions/cache@v1
      - uses: julia-actions/julia-buildpkg@v1
      - uses: julia-actions/julia-runtest@v1
      - uses: julia-actions/julia-processcoverage@v1
      - uses: codecov/codecov-action@v2
        with:
          files: ./lcov.info
      - uses: coverallsapp/github-action@1.1.3
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          path-to-lcov: "./lcov.info"

@Danon you can map that in the matrix, and if you have an action that supports failure, this is straight forward.

Example excerpt (from):

jobs:
  ci:
    name: .travis.yml / PHP ${{ matrix.php-version }} / ${{ matrix.machine }}
    runs-on: ${{ matrix.machine }}
    continue-on-error: ${{ matrix.experimental }}   # <<-- mapping on continue-on-error from matrix

    strategy:
      fail-fast: false
      matrix:
        machine: ['ubuntu-18.04']
        php-version: ['8.1', '8.0', '7.4', '7.3', '7.1', '7.0', '5.6']
        experimental: [false]
        include:
          - machine: 'ubuntu-18.04'
            php-version: '7.2'
            experimental: true

    steps:
      - uses: actions/checkout@v2

      - uses: shivammathur/setup-php@v2
        with: {php-version: ${{ matrix.php-version }}}

      - uses: ktomk/run-travis-yml@v1
        with:
          allow_failure: ${{ matrix.experimental }}   # <<-- mapping for the action
        env:
          TRAVIS_PHP_VERSION: ${{ matrix.php-version }}

Should work with any action that supports allow-failure behaviour and should be easy to adopt for own workflow steps.

For continue-on-error on a step there is 0 indication the step or the job failed, which makes it a sure way to never see any issue…

Well, that’s the command: continue-on-error. You set it if you want to get the error out of the way. It does not unbind you from reading the logs… . It is not something similar to allow-failure as known from Travis-CI where you don’t need to read the logs and get the message in the summary. But the concept of actions on GHA and the build matrix on TCI is entirely different.

As a hack if you have a simple check, maybe it’s reasonable, but I feel this issue goes further than just adding a visual indicator for “neutral” builds or allowing some failed builds to not completely fail the whole workflow.

Yes it goes further, however if you implement to allow a failure on a build job - which you can do on GHA already today - , what you would still miss is the visual indicator.

continue-on-error is indeed a different concept. It can help when you come from the Travis-CI allow-failure mindset, but it does not match well, often not at all.

@ktomk the problem is that you both want to have it reported (so you know when it stops failing, and you can promote that new version to “fully supported” status) and have no impact on the PR CI passing.

The problem is that we - CI users - have to justify why allow-failure is a vital option.

@szepeviktor Do you think the above arguments don’t sufficiently cover that? (@thboop?)