vitess: Bug Report: vttablet hanging when running PlannedReparentShard
Overview of the Issue
During a PRS we sometimes see the tablet stuck waiting for something at this point in the logs:
I0317 04:56:50.637023 9434 rpc_replication.go:388] DemotePrimary
I0317 04:56:50.638689 9434 rpc_replication.go:438] DemotePrimary disabling query service
I0317 04:56:50.638703 9434 state_manager.go:214] Starting transition to PRIMARY Not Serving, timestamp: 2022-03-17 04:25:38.205679188 +0000 UTC
I0317 04:56:50.638773 9434 tablegc.go:212] TableGC: closing
...
Here’s a sample debug blocking profile covering this: https://gist.github.com/derekperkins/dd6d54809a98b582c03909061e639766
Reproduction Steps
We suspect that it involves:
- Setting up many active vstreams: MoveTables,Reshard,OnlineDDL,Messaging
- Doing PRS while those are active
Binary Version
v13.0.0
Operating System and Environment details
N/A
Log Fragments
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (17 by maintainers)
@derekperkins if it’s possible to also grab the output of
mysql> show full processlist;
in the primary tablet’s mysqld instance when you see the messages stop flowing from the shard that would be very helpful.Thanks!
I’m rebuilding now on my same branch, with #9942 cherry picked on top. Will deploy as soon as the build completes https://hub.docker.com/repository/registry-1.docker.io/vitess/base/builds/c1a58d39-4848-4237-bc1f-f16512ec7948
update: it’s deployed now. Most of our heavy usage starts at midnight UTC, so hopefully I’ll have some logs by tomorrow