rails: Rails cache prefix is not properly configured for isolated caching during parallel tests.
I’ve been investigating why I am seeing a lot of transient test failures due to cache collisions with parallelized tests. I thought it was because Rails doesn’t update the cache prefix to protect from collisions but after looking at it; it’s looking like Rails tries to do that (by appending Process.pid
and Thread.current.object_id
to the namespace), but that it isn’t working properly.
For the moment I’ve worked around it by adding namespace delineation myself in the parallelize_setup
block of the test helper, but it’d be better if Rails handled this automatically so that no-one has to worry about it.
Steps to reproduce
Replace test/test_helper.rb
contents with the following:
module ActiveSupport
class TestCase
# Run tests in parallel with specified workers
parallelize(workers: :number_of_processors)
# Setup all fixtures in test/fixtures/*.yml for all tests in alphabetical order.
fixtures :all
parallelize_setup do |worker|
puts "#{worker.inspect}: #{Rails.cache.options[:namespace]}"
end
# Add more helper methods to be used by all tests here...
end
end
Expected Behavior
The namespaces listed should be unique per worker to isolate the test runners so that they do not collide during parallelization. It should look something like this (assuming that the cache prefix configured is test:20230518:
, which is what mine is configured to be).
6: test:20230518:63512:3580:
5: test:20230518:63513:3580:
0: test:20230518:63514:3580:
3: test:20230518:63515:3580:
7: test:20230518:63516:3580:
4: test:20230518:63517:3580:
1: test:20230518:63518:3580:
2: test:20230518:63519:3580:
Actual Behavior
Instead, the namespaces listed all share identical process identifiers and thread identifiers.
6: test:20230518:63512:3580:
5: test:20230518:63512:3580:
0: test:20230518:63512:3580:
3: test:20230518:63512:3580:
7: test:20230518:63512:3580:
4: test:20230518:63512:3580:
1: test:20230518:63512:3580:
2: test:20230518:63512:3580:
System configuration
Rails version:
% fgrep rails Gemfile.lock
rails-dom-testing (~> 2.0)
rails-dom-testing (~> 2.0)
rails-html-sanitizer (~> 1.0, >= 1.2.0)
rails-dom-testing (~> 2.0)
rails-html-sanitizer (~> 1.1, >= 1.2.0)
importmap-rails (1.1.6)
rails (7.0.5)
rails-dom-testing (2.0.3)
rails-html-sanitizer (1.6.0)
sentry-rails (5.9.0)
sprockets-rails (3.4.2)
tailwindcss-rails (2.0.29-arm64-darwin)
tailwindcss-rails (2.0.29-x86_64-darwin)
tailwindcss-rails (2.0.29-x86_64-linux)
turbo-rails (1.4.0)
importmap-rails (~> 1.1)
rails (~> 7.0.5)
sentry-rails
sprockets-rails
tailwindcss-rails (~> 2.0)
turbo-rails
Ruby version:
% ruby --version
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin21]
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 21 (9 by maintainers)
Absolutely not, I think there is a big misunderstanding here.
What Dima and I mean, is isolation between two subsequent tests, that’s an isolation you want even without parallelization.
The idea that each test should start with an empty cache, just like how each test should start with an empty database (or reset to a specific state). And Rails should probably do that by default for you to propagate this best practice.
At no point we suggested that calling
cache.clear
before or after each test would solve the parallel test isolation issue.Now, why I fear a very imperfect fix is worse that no fix, is that for most cache store the only way to have proper isolation is to have one instance of that store (I mean one memcached server, etc) per parallel test worker. That’s something Rails can’t do for you, so I fear implicit namespacing, while it may work for some people, is absolutely not robust enough to be the default and might lead users to experience weird test failures that would be even harder to debug that what you experienced.
Additionally, test parallelization and cache store are entirely decoupled, so it’s really unclear how we could do what you suggest without making a mess in the codebase.
However I want to re-iterate that I absolutely agree with you that the current situation is totally undesirable, and that we should look at improving this, I’m not closing this issue, I’m not saying there is no problem.
We could indeed automatically namespace, but like Dima said, that’s would be a fairly leaky abstraction, as if you can’t fully clear the cache between each tests, you are leaking state which is a recipe for flaky tests. So I’d prefer to find a better solution.
For Redis, a better solution is to use distinct database per test worker, but by default that limits you to 16 workers (which generally should be plenty), but for Memcached, I don’t think we can.
All this to say, we agree this is imperfect and it would be good to improve the situation, but what you propose isn’t necessarily without flaws.
I’d have to some research in old parallel tests related PRs as well as the documentation what the reasoning was back in the day, and if some solutions were explored before.
Because sometimes a solution with big flaws is much worse than no solution. And I fear that’s the case here. But again, this isn’t to say that nothing should be done.