rails: Rails cache prefix is not properly configured for isolated caching during parallel tests.

I’ve been investigating why I am seeing a lot of transient test failures due to cache collisions with parallelized tests. I thought it was because Rails doesn’t update the cache prefix to protect from collisions but after looking at it; it’s looking like Rails tries to do that (by appending Process.pid and Thread.current.object_id to the namespace), but that it isn’t working properly.

For the moment I’ve worked around it by adding namespace delineation myself in the parallelize_setup block of the test helper, but it’d be better if Rails handled this automatically so that no-one has to worry about it.

Steps to reproduce

Replace test/test_helper.rb contents with the following:

module ActiveSupport
  class TestCase
    # Run tests in parallel with specified workers
    parallelize(workers: :number_of_processors)

    # Setup all fixtures in test/fixtures/*.yml for all tests in alphabetical order.
    fixtures :all

    parallelize_setup do |worker|
      puts "#{worker.inspect}: #{Rails.cache.options[:namespace]}"
    end

    # Add more helper methods to be used by all tests here...
  end
end

Expected Behavior

The namespaces listed should be unique per worker to isolate the test runners so that they do not collide during parallelization. It should look something like this (assuming that the cache prefix configured is test:20230518:, which is what mine is configured to be).

6: test:20230518:63512:3580:
5: test:20230518:63513:3580:
0: test:20230518:63514:3580:
3: test:20230518:63515:3580:
7: test:20230518:63516:3580:
4: test:20230518:63517:3580:
1: test:20230518:63518:3580:
2: test:20230518:63519:3580:

Actual Behavior

Instead, the namespaces listed all share identical process identifiers and thread identifiers.

6: test:20230518:63512:3580:
5: test:20230518:63512:3580:
0: test:20230518:63512:3580:
3: test:20230518:63512:3580:
7: test:20230518:63512:3580:
4: test:20230518:63512:3580:
1: test:20230518:63512:3580:
2: test:20230518:63512:3580:

System configuration

Rails version:

% fgrep rails Gemfile.lock
      rails-dom-testing (~> 2.0)
      rails-dom-testing (~> 2.0)
      rails-html-sanitizer (~> 1.0, >= 1.2.0)
      rails-dom-testing (~> 2.0)
      rails-html-sanitizer (~> 1.1, >= 1.2.0)
    importmap-rails (1.1.6)
    rails (7.0.5)
    rails-dom-testing (2.0.3)
    rails-html-sanitizer (1.6.0)
    sentry-rails (5.9.0)
    sprockets-rails (3.4.2)
    tailwindcss-rails (2.0.29-arm64-darwin)
    tailwindcss-rails (2.0.29-x86_64-darwin)
    tailwindcss-rails (2.0.29-x86_64-linux)
    turbo-rails (1.4.0)
  importmap-rails (~> 1.1)
  rails (~> 7.0.5)
  sentry-rails
  sprockets-rails
  tailwindcss-rails (~> 2.0)
  turbo-rails

Ruby version:

% ruby --version
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin21]

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 21 (9 by maintainers)

Most upvoted comments

The assertion seems to be that running clear for the cache at the start of each test is a “better way” of achieving isolation.

Absolutely not, I think there is a big misunderstanding here.

What Dima and I mean, is isolation between two subsequent tests, that’s an isolation you want even without parallelization.

The idea that each test should start with an empty cache, just like how each test should start with an empty database (or reset to a specific state). And Rails should probably do that by default for you to propagate this best practice.

At no point we suggested that calling cache.clear before or after each test would solve the parallel test isolation issue.

Now, why I fear a very imperfect fix is worse that no fix, is that for most cache store the only way to have proper isolation is to have one instance of that store (I mean one memcached server, etc) per parallel test worker. That’s something Rails can’t do for you, so I fear implicit namespacing, while it may work for some people, is absolutely not robust enough to be the default and might lead users to experience weird test failures that would be even harder to debug that what you experienced.

Additionally, test parallelization and cache store are entirely decoupled, so it’s really unclear how we could do what you suggest without making a mess in the codebase.

However I want to re-iterate that I absolutely agree with you that the current situation is totally undesirable, and that we should look at improving this, I’m not closing this issue, I’m not saying there is no problem.

We could indeed automatically namespace, but like Dima said, that’s would be a fairly leaky abstraction, as if you can’t fully clear the cache between each tests, you are leaking state which is a recipe for flaky tests. So I’d prefer to find a better solution.

For Redis, a better solution is to use distinct database per test worker, but by default that limits you to 16 workers (which generally should be plenty), but for Memcached, I don’t think we can.

All this to say, we agree this is imperfect and it would be good to improve the situation, but what you propose isn’t necessarily without flaws.

I’d have to some research in old parallel tests related PRs as well as the documentation what the reasoning was back in the day, and if some solutions were explored before.

why wouldn’t Rails implement this to help the developers using it out?

Because sometimes a solution with big flaws is much worse than no solution. And I fear that’s the case here. But again, this isn’t to say that nothing should be done.