syrupy: Incompatible with pytest-xdist / thread safety
Describe the bug
I’m using pytest-xdist to run my tests in parallel (mostly a single test parameterized for a vast amount of different test data). Since I wanted to try snapshot-based testing, I’ve added syrupy to the mix. After initial creation of __snapshots__/mytest.ambr (using the --snapshot-update), it disappears from disk at some point during the test run. Without pytest-xdist (i.e. without using -n auto option) this doesn’t happen.
To reproduce
I don’t have a clean reproduction, only an observation. My setup is a single test for a bit of processing, parameterized with a couple of hundreds different input files. While the .ambr file appear initially, after about 80% of tests done, the file disappears.
Expected behavior
Snapshot files get updated with correct data without disappearing, even when used with pytest-xdist or pytest-parallel.
Environment:
- OS: Ubuntu 20.04
- Syrupy Version: 1.4.0
- Python Version: 3.8
Additional context
I assume the problem happens because each test execution writes to the file individually and thus a race condition happens. Not sure what can be done about it, maybe buffer the snapshot file somewhere and write it only at the end of the test run? There are surely better ideas. I assume that e.g. the JUnit reporter works correctly with pytest-xdist, maybe check what it does?
Either way, I wanted to bring this to your attention, would be happy if there was a short- to middle-term solution. If not, I’ll run the suite without parallelization, it’s not that big of a deal at the moment, at least for me 😃
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 18 (2 by maintainers)
Commits related to this issue
- test: fix performance issues when all tests run When running all the tests the performance degrades quickly due to the number of threads growing with every test. Use pytest-xdist plugin to run the t... — committed to pi-top/pi-top-4-Miniscreen by olivierwilkinson 2 years ago
- test: fix performance issues when all tests run Use pytest-xdist plugin to run the tests concurrently in different processes. Splitting the execution across multiple workers significantly reduces the... — committed to pi-top/pi-top-4-Miniscreen by olivierwilkinson 2 years ago
The JSON extension works like this but I suspect it also doesn’t work too well with pytest-xdist. There’s a use case for 1 file per test case, but I can see good arguments against it, for example consider the use of parametrization. You probably don’t want 1000s of snapshot files per test case. Especially since these are expected to be committed to version control.
I’m trying my best to see if syrupy can be refactored/improved to the point where minimal work is required for pytest-xdist compatibility. Using this as an opportunity to also cleanup the architecture a bit.
pytest-xdist already has the concept of a controller process. I’m hoping to piggyback on this. We’ll see. Whatever we do to ensure pytest-xdist support, it shouldn’t negatively impact our non-xdist users.
Syrupy itself doesn’t leverage any “hacks” to do its job. It 100% respects the pytest plugin API. It’s pytest-xdist which is doing some odd things to add parallelization to pytest. This breaks some assumptions when working with the pytest API. This is why we can’t just treat pytest-xdist as a black box, because it does in fact change the way tests are run.
@mcataford you’ll want to work off of the next branch. I was running into performance issues which will need to be tackled first. We’re running benchmarks against the main branch, so you can use those benchmarks as a reference.
Will try take a deeper look this weekend but what needs to be clarified:
If I can’t solve it immediately, I’ll do some refactoring in syrupy to hopefully make it easier to reason about. I’ve been meaning to do some serious refactoring to the syrupy internals for a while and have been planning it for a v2 release since it’ll affect plugin development.
I started digging into the xdist implementation, and this won’t be as straightforward as I hoped. Definitely open to help/contributions here. It looks like each worker manages its own session. So where/when should we write snapshots? Can two workers get nodes from the same test file? In which case, we might have 2 workers trying to write to the same file?
Even if we overcame that obstacle, we run into a problem with unused snapshot detection. We need to somehow defer our snapshot analysis until after all the workers have been cleaned up.
I haven’t had a chance to work on this just yet. I previously refactored some code to boost performance. Buffering the writes should still be a feasible fix here.
If you’re interested in taking this on, I think we’d want to batch and then defer this operation https://github.com/tophat/syrupy/blob/ec4cbac3fa71eb7dad276005f9433ce26c2f1f9a/src/syrupy/assertion.py#L197 (the write_snapshot call), collect them for after the tests finish running, and flush them out to disk at the same time in some coordinated fashion (still need to look into pytest-xdist to see how that may be possible). The hooks are located in the
__init__.pyfile. – If you are interested, feel free to join our discord and I’ll be available as a resource. Otherwise, I will get to this at some point. I ended up prioritizing the JSON extension over this issue, but as that’s out of the way, this is definitely the next most pressing issue.Might actually take a stab at it now.
I’m also using pytest-xdist and want to use syrupy in my test suite. I run into a different problem.
Just having syrupy installed will break pytest when using the pytest-xdist option
--looponfail.If I uninstall syrupy the error goes away. I have not added any syrupy-specific code to any of my tests.
I’m re-examining where we do the reads/writes in an effort to solve https://github.com/tophat/syrupy/issues/539. I’m hoping that work will also solve the concurrency issue with pytest-xdist.