rails: LoadInterlockAwareMonitor deadlock when clearing cache (multiple databases, test)

With multiple databases, we observe hangs within system tests.

See also https://gitlab.com/gitlab-org/gitlab/-/issues/371468 (GitLab is at Rails 6.1, Ruby 2.7.5)

Steps to reproduce

Here’s a minimal reproduction with Rails 7.0.3.1

  1. Checkout https://gitlab.com/tkuah/multi_database_system_test
  2. bundle install
  3. bundle exec rails db:migrate
  4. bundle exec rails test test/system/home_pages_test.rb

The conditions are:

  1. Multiple databases
  2. System test where there’s an async request which overlaps with the main thread

Expected behavior

It does not hang, and the test completes

Actual behavior

It hangs. It looks like it’s hanging when clearing query cache (see https://gitlab.com/gitlab-org/gitlab/-/issues/371468#note_1087203883, and https://gitlab.com/gitlab-org/gitlab/-/issues/371468#note_1088779499)

A backtrace dump taken while the test is hanging sigdump-68442.log

My suspicion is that:

  1. lock_thread is true for system tests
  2. Therefore both the main thread, and the Puma thread attempt to clear query cache for the same set of connections

So we have two connections:

  1. Puma thread takes a LoadInterlockAwareMonitor lock for primary connection via Post.transaction
  2. Main thread takes a LoadInterlockAwareMonitor lock for animal connection via implicit transaction that create does
  3. Due to 2, main thread attempts to clear query cache for primary connection. Waits for 1’s lock
  4. Due to 1, and because of lock_thread, puma thread clears query cache for primary connection. It takes a LoadInterlockAwareMonitor lock for primary connection which succeds
  5. Due to 1, puma thread now attempts clears query cache for animal connection. Waits for 2’s lock

See clear_query_caches_for_current_thread method

System configuration

Rails version: 7.0.3.1

Ruby version: ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [arm64-darwin21]

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 50 (40 by maintainers)

Commits related to this issue

Most upvoted comments

I think I have a fix for your issue https://github.com/rails/rails/pull/46661

maybe even disable transactional fixtures for system tests

We wouldn’t be able to do this without building in functionality like Database Cleaner. It was an intentional design decision to not add that complexity. I’m fine revisiting this decision but removing the locking threads is not viable without another solution in place.

the app is using sqlite3 not mysql. Does that change anything? We’re using latest mysql2, 0.5.4

Yeah I changed that on a later commit, same results.

This might be an option, though I’m not sure how I feel about it: https://github.com/rails/rails/commit/e5a815df3bf6249367eb09de89be8773e764caa9