vitess: Failed backup puts tablet into unhealthy state
I have a shard where the backup fails about 50% of the time. The shard in question has 225 GB of data. Once it fails, it never recovers until I delete the k8s pod and it restarts.
While the error itself is different, the outcome sounds the same as what @rnavarro reported in #2238.
vttablet logs
I0121 13:35:09.379272 1 mysqld.go:329] Mysqld.Shutdown
I0121 13:35:09.379276 1 mysqld.go:333] executing Mysqld.Shutdown() remotely via mysqlctld server: /vtdataroot/mysqlctl.sock
I0121 13:35:14.420687 1 backup.go:413] found 132 files to backup
E0121 13:58:54.605921 1 engine.go:185] periodic schema reload failed: could not get table list for reload: net.Dial(/vtdataroot/tabletdata/mysql.sock) to local server failed: dial unix /vtdataroot/tabletdata/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
I0121 13:58:54.606008 1 tabletserver.go:320] TabletServer state: NOT_SERVING (Not Serving) -> NOT_SERVING (Transitioning)
I0121 13:58:54.606070 1 tabletserver.go:320] TabletServer state: NOT_SERVING (Transitioning) -> NOT_SERVING (Not Serving)
I0121 13:58:54.606086 1 tabletserver.go:682] Check MySQL failed. Shutting down query service
I0121 13:58:54.606105 1 tabletserver.go:320] TabletServer state: NOT_SERVING (Not Serving) -> SHUTTING_DOWN (Shutting Down)
I0121 13:58:54.606119 1 tabletserver.go:583] Executing complete shutdown.
I0121 13:58:54.606385 1 tabletserver.go:590] Shutdown complete.
I0121 13:58:54.606409 1 tabletserver.go:320] TabletServer state: SHUTTING_DOWN (Shutting Down) -> NOT_SERVING (Not Connected)
E0121 14:35:09.220352 1 backup.go:496] failed to close file [20 context deadline exceeded]: %!v(MISSING)
I0121 14:35:09.220485 1 mysqld.go:177] executing Mysqld.Start() remotely via mysqlctld server: /vtdataroot/mysqlctl.sock
E0121 14:35:09.220545 1 backup.go:287] backup is not usable, aborting it: [can't restart mysqld: context deadline exceeded: <nil>]
E0121 14:35:09.220819 1 backup.go:295] failed to finish backup: [context deadline exceeded]: %!v(MISSING)
E0121 14:35:09.220935 1 rpc_backup.go:85] mysql backup command returned error: can't restart mysqld: context deadline exceeded: <nil>
I0121 14:35:09.220982 1 state_change.go:130] Executing post-action state refresh: after backup
W0121 14:35:09.221009 1 state_change.go:141] Failed rereading tablet after after backup - services may be inconsistent: deadline exceeded: tablets/aks-1588972600/Tablet
W0121 14:35:09.221070 1 rpc_server.go:91] TabletManager.Backup(concurrency:4 )(on aks-1588972600 from ) error: refreshTablet failed rereading tablet after after backup: deadline exceeded: tablets/aks-1588972600/Tablet
mysqlctld logs
I0121 13:35:09.393314 1 mysqld.go:329] Mysqld.Shutdown
I0121 13:35:09.394243 1 mysqld.go:367] No mysqld_shutdown hook, running mysqladmin directly
I0121 13:35:09.394887 1 mysqld.go:433] execCmd: /usr/bin/mysqladmin /usr/bin/mysqladmin [--defaults-extra-file=/tmp/example225411938 --shutdown-timeout=300 --connect-timeout=30 --wait=10 shutdown]
I0121 13:35:14.023813 1 mysqld.go:260] Mysqld.Start(1548059306) exit: <nil>
I0121 13:35:14.417685 1 mysqld.go:447] execCmd: /usr/bin/mysqladmin output:
I0121 13:35:14.417728 1 mysqld.go:408] Mysqld.Shutdown: waiting for socket file (/vtdataroot/tabletdata/mysql.sock) and pid file (/vtdataroot/tabletdata/mysql.pid) to disappear
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 20 (20 by maintainers)
@derekperkins You’re correct, it was a flag to the
vtctlclient
call:vtctlclient -action_timeout 2h0m0s