cucumber: Replay failed tests
I was wondering if it’s something that we can add to the framework. We already have a function to repeat failed tests output but we don’t have a function to run failed tests again. It’s common to have flaky tests that sometimes can be solved by running them again.
For example :
#[tokio::main]
async fn main() {
AnimalWorld::cucumber()
.replay_failed(2)
.run_and_exit("tests/features/book/output/terminal_repeat_failed.feature")
.await;
}
Where 2 is the maximum number of times the tests should be run again in case of test failures. Let’s say we have the tests A, B, C and D :
- During the first run only
Apasses - Then
B,CandDare run again (1 replay left if we have new test failures) - Only B passes, we run again
CandD(0 replay left) Dfails again, this one is certainly more than just unstable
Regarding the output we have two choices :
- Print all test executions : it’s transparent but can be repetitive when the tests fail multiple times (like
Din this example) - Or just print the result once the last replay is done (which can be the maximum, here
2, or a previous run if all tests are passing)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (23 by maintainers)
Thank you very much for the hard work @ilslv @tyranron . That’s a fantastic implementation!
I was a bit busy with other projects, I’ll start to work on it 😃
Discussed with @tyranron:
.after(all)entirely, only.after(3s)is allowed.Feature/Rulebranch.--retry,--retry-afterand--retry-tag-filterCLI optionsDetailed explanation of interactions between CLI options and tags
Let’s explore how different sets of CLI options would interact with the following
Feature:No CLI options at all
Scenarioretried once without a delayScenarioretried 3 times without a delayScenarioretried 4 times with 5 seconds delay in betweenScenarioisn’t retried--retry=5Scenarioretried 5 times without a delayScenarioretried 3 times without a delayScenarioretried 4 times with 5 second delay in betweenScenarioretried 5 times--retry-tag-filter='@flacky'Scenarioretried once without a delayScenarioretried 3 times without a delayScenarioretried 4 times with 5 second delay in betweenScenarioisn’t retried--retry-after=10sScenarioretried once with 10 seconds delay in betweenScenarioretried 3 times with 10 seconds delay in betweenScenarioretried 4 times with 5 seconds delay in betweenScenarioisn’t retried--retry=5 --retry-after=10sScenarioretried 5 times with 10 seconds delay in betweenScenarioretried 3 times with 10 seconds delay in betweenScenarioretried 4 times with 5 seconds delay in betweenScenarioretried 5 times with 10 seconds delay in between--retry=5 --retry-tag-filter='@flacky'Scenarioretried once without a delayScenarioretried 3 times without a delayScenarioretried 4 times with 5 second delay in betweenScenarioisn’t retried--retry=5 --retry-after=10s --retry-tag-filter='@flacky'Scenarioretried 5 times with 10 seconds delay in betweenScenarioretried 3 times with 10 seconds delay in betweenScenarioretried 4 times with 5 second delay in between@ilslv
I think that just print the error “as is” and later having the
| Retry #<num>label is more than enough. Like here, but with an error.I’m not against this. I’ve thought about it too.
That’s OK, but the concern I’ve raised is not about power, but rather ergonomic and CLI. I could easily imagine the situation when someone wants to retry the test suite without populating
@retrytags here and there. Like--retry=3 --retry-tag-filter='@webrtc or @http'and then--retry=2 --retry-tag-filter='@webrtc or @http or @animal. It’s OK if it will be built on top ofCucumber::which_scenario, but I’d vote to have this in CLI as the use cases and ergonomics benefits are quite clear.@theredfish
Actually
Scenarios are generally run in parallel, so there is no need for additional complexities you’ve described. We can just rerun failedScenarios on their own.I’ll be happy to help you with development of this feature!
Thank you for the feedback! Indeed the idea isn’t to encourage to ignore flaky tests but have a way to handle them waiting for a fix.
The tag is a good idea so we offer a different granularity and an explicit way.
My bad it wasn’t clear enough but my example was about scenarios not steps.
I can try to implement this feature if you want ? If you’re available to guide me during the development ?
@ilslv yup.
It also worths to mark the retried scenarios explicitly in the output, like the following:
@theredfish I do think, that adding retries for flaky tests is a great feature to have, but I have couple concerns about proposed implementation.
In addition to specifying number of times test should be retried, I think that we should retry only tests tagged as
@flakyor something like that, as being explicit is better that implicit here. Maybe even allow to override this value with something like@flaky(retries = 3). I want this library to be a tool, that is hard to misuse and with defaults that follow best practices. So adding@flakyshould be good point of friction for user to think twice, why thisScenariois flaky.I think, that this may lead to unexpected problems: on
panicchanges from stepBmay be partially applied and fully retiring it may cause some unexpected changes inWorldstate. Consider the followingStepimplementation:Retrying this
Stepafter anassert_eq!will always lead to incrementation of thew.counter. So, as we don’t imposeClonebound on ourWorld(otherwise we would be able to.clone()theWorldbefore everyStepand rollback to it, if needed), the only option left is to retry entireScenario.I’m not sure this can be achieved with streaming output like ours. And even if it would, I think, that we should be transparent about failed flaky tests and also include stats about them in
SummarizedWriter.@tyranron
I saw couple of conference talks, where people from huge
FAANG-like companies argued that at this scale flaky tests are inevitable. I’m not sure I agree with them, but there is at least opinions floating around. Also other test runners provide this feature out of the box.