closure_tree: Trying to add 2 nodes simultaneously to closure_tree table results in a deadlock
When two nodes are added simultaneously to the closure tree table, this results in a deadlock and the query to acquire lock runs infinitely:
(0.7ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t46e4b4acfd42770bad34d3e2a13a458a
(0.4ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t9eaf21f51a8c5a1f2e39f6e8c4971b27
(0.4ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS tae827dc66e71e92f4c9f6edecae1cb54
(0.3ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t7f7d47e378ffde7f311e56fde6613ab5
(0.3ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS ta7af1b261b90cba9e4d7399c96f17e26
(0.3ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t3ed658e2e0d3008375e7356cc8e1c4a2
(0.4ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t080acfd6e5536c50c4917ce271671be6
(0.3ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t850056c6b58f75d8171bdcb69def558a
(0.3ms) SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t1ece95b52215d7c5e420ea9c0057aa72
You can reproduce the problem easily by using the code below (where List is the closure_tree model):
Thread.new do
parent = List.find(86)
new_node = List.new(:name => "test deadlock 1")
parent.add_child(new_node)
end
Thread.new do
parent = List.find(82)
new_node = List.new(:name => "test deadlock 1")
parent.add_child(new_node)
end
We faced this problem on production, and the infinite locking loop results in mysql being unresponsive, which in turn results in unresponsive app.
closure_tree version: 6.2.0 ruby: 2.3.0 rails: 5.0.0.1
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 8
- Comments: 43 (1 by maintainers)
@mceachen I have started to take a look. I have two PRs open on
with_advisory_lockto get the testing infrastructure ready to replicate this issue.Sorry, its not a deadlock in the traditional sense, its actually worse since a deadlock will be detected and aborted. In this case, the first process has obtained the lock and is proceeding on its way when the second process comes in and attempts to obtain a lock. It fails since writer one already has the a lock (which is the correct behavior IMO). What I am having trouble understanding is why writer 1 hangs until writer 2 gives up attempting to obtain its lock.
If I run something like:
on two separate machines upon starting the second both will shortly arrive in a semi-deadlocked situation. The script that I started the call on will then eventually expire and the first writer will continue. I’ve tried moving around the transaction/without it. Below is one potential outcome.
Writer 1 :
Writer 2
I may have found a workaround that works for me – though at present I’m not sure of what other consequences may arise from my solution.
I’d long since upgraded the database to MariaDB 10.x from some older MySQL without apparent incident. Until one day when I needed to reparent something, and then…
At the time, i wrote an awful script that monkey-patched the part of
closure_treethat dealt with locking (IIRC), and managed to get my reparenting done.Now, however, I’m going to need to reparent more often, so I took some time today to do a little more research with a simple test script…
Which, when run… Failed:
…but provided some clues as to what was going on.
The (currently) third solution to https://stackoverflow.com/questions/6000336/how-to-debug-lock-wait-timeout-exceeded-on-mysql mentions that InnoDB’s default isolation level is REPEATABLE_READ, whereas other DBs use READ_COMMITTED.
Setting my database’s default isolation level to READ_COMMITTED…
…did the trick and allowed the reparenting script (and another more complicated script) to work as anticipated. Not sure I’d want to leave this on globally, might be best to set it as tightly around the code as possible (before that first transaction). Maybe something like…
I’m not confident this solves the other two-thread deadlock issue, as that appears to be happening with non-MySQL/MariaDB databases, also.
@tomaadland I have started work over here: https://github.com/ClosureTree/with_advisory_lock/issues/17
@mceachen It appears the issue I am running into is specifically (and I am guessing amitsaxena as well) around the numeric reordering (https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L76) when calling add_child. Writer 1 will hang on grabbing https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L63
while writer 2 will attempt to write into: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/numeric_order_support.rb#L39
If you call directly children.create() it will avoid this issue, though I would guess any concurrent movement of leafs with numeric ordering enabled could result in this.
If I replace the with_advisory_lock from support.rb with just a model_class.lock(true) the deadlock appears to go away.
I did also note that if you attempt a destroy_all on a large tag set > 10000 (with our without numeric ordering) the destroy will fail with fatal: exception reentered.
Hope this is helpful.