rails: LoadInterlockAwareMonitor deadlock when clearing cache (multiple databases, test)
With multiple databases, we observe hangs within system tests.
See also https://gitlab.com/gitlab-org/gitlab/-/issues/371468 (GitLab is at Rails 6.1, Ruby 2.7.5)
Steps to reproduce
Here’s a minimal reproduction with Rails 7.0.3.1
- Checkout https://gitlab.com/tkuah/multi_database_system_test
bundle install
bundle exec rails db:migrate
bundle exec rails test test/system/home_pages_test.rb
The conditions are:
- Multiple databases
- System test where there’s an async request which overlaps with the main thread
Expected behavior
It does not hang, and the test completes
Actual behavior
It hangs. It looks like it’s hanging when clearing query cache (see https://gitlab.com/gitlab-org/gitlab/-/issues/371468#note_1087203883, and https://gitlab.com/gitlab-org/gitlab/-/issues/371468#note_1088779499)
A backtrace dump taken while the test is hanging sigdump-68442.log
My suspicion is that:
lock_thread
is true for system tests- Therefore both the main thread, and the Puma thread attempt to clear query cache for the same set of connections
So we have two connections:
- Puma thread takes a
LoadInterlockAwareMonitor
lock for primary connection viaPost.transaction
- Main thread takes a
LoadInterlockAwareMonitor
lock for animal connection via implicit transaction thatcreate
does - Due to 2, main thread attempts to clear query cache for primary connection. Waits for 1’s lock
- Due to 1, and because of
lock_thread
, puma thread clears query cache for primary connection. It takes aLoadInterlockAwareMonitor
lock for primary connection which succeds - Due to 1, puma thread now attempts clears query cache for animal connection. Waits for 2’s lock
See clear_query_caches_for_current_thread
method
System configuration
Rails version: 7.0.3.1
Ruby version: ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [arm64-darwin21]
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 50 (40 by maintainers)
Commits related to this issue
- Make AbstractAdapter#lock thread local by default Fix: https://github.com/rails/rails/issues/45994 A semi-common issue since Ruby 3.0.2 is that using a fiber inside a transaction cause a deadlock: ... — committed to Shopify/rails by byroot 2 years ago
- Use a single lock for all connections Followup: https://github.com/rails/rails/pull/46519 Followup: https://github.com/rails/rails/pull/46553 Fix: https://github.com/rails/rails/issues/45994 When us... — committed to Shopify/rails by byroot 2 years ago
I think I have a fix for your issue https://github.com/rails/rails/pull/46661
We wouldn’t be able to do this without building in functionality like Database Cleaner. It was an intentional design decision to not add that complexity. I’m fine revisiting this decision but removing the locking threads is not viable without another solution in place.
Yeah I changed that on a later commit, same results.
This might be an option, though I’m not sure how I feel about it: https://github.com/rails/rails/commit/e5a815df3bf6249367eb09de89be8773e764caa9