vitess: VStream server-side error during gh-ost online schema migration
Overview of the Issue
Our Debezium Vitess Connector (CDC) uses VStream gRPC to stream change events from a sharded (2 shards: -80
and 80-
) keyspace called test_sharded_keyspace
.
When running the following gh-ost
online schema migration:
vtctlclient -server vtctld-host:15999 ApplySchema -sql "ALTER WITH 'gh-ost' TABLE bar_entry add column status int" test_sharded_keyspace
VStream gRPC throws a server-side error:
io.grpc.StatusRuntimeException: UNKNOWN: target: test_sharded_keyspace.80-.replica,
used tablet: zoneA-301 (prelive-ib-tablet-301.vt): vttablet:
rpc error: code = Unknown desc = stream (at source tablet)
error @ fa7c9236-2c16-11eb-8077-024d038b20ae:1,fac1535e-2c16-11eb-88b0-063538010254:1-1131111:
cannot determine table columns for bar_entry:
event has [8 254 17 17 8 8 8 15 246 254 246 1 2 246 3 3],
schema as [
name:"id" type:UINT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"id" column_length:20 charset:63 flags:49699
... (14 other columns, in total 15 columns, which is 1 column less than in the event above)
]
Reproduction Steps
Steps to reproduce this issue:
-
Deploy the following
vschema
:{ "sharded": true, "vindexes": { "hash": { "type": "hash" } }, "tables": { "bar_entry": { "columnVindexes": [ { "column": "c", "name": "hash" } ] } } }
-
Deploy the following
schema
:CREATE TABLE `bar_entry` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `a` enum('fooEntry') NOT NULL DEFAULT 'fooEntry', `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `b` bigint NOT NULL, `c` bigint unsigned NOT NULL, `d` bigint NOT NULL, `e` varchar(255) NOT NULL, `f` decimal(14, 4) NOT NULL, `g` char(3) NOT NULL, `h` decimal(14, 4) NOT NULL DEFAULT '1.0000', `i` tinyint DEFAULT '1', `j` smallint NOT NULL DEFAULT '0', `k` decimal(14, 4) DEFAULT NULL, `l` int unsigned DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `idx_foo_unique` ( `b`, `d`, `e`, `j`, `i` ), KEY `idx_foo` (`d`, `e`, `l`) ) ENGINE = InnoDB DEFAULT CHARSET = utf8;
-
Run VStream gRPC client to continuously stream from the sharded keyspace
test_sharded_keyspace
where the table resides in. -
The table has 30 million rows.
-
Run
vtctlclient -server vtctld-host:15999 ApplySchema -sql "ALTER WITH 'gh-ost' TABLE bar_entry add column status int" test_sharded_keyspace
to startgh-ost
online schema migration. -
Run
vtctlclient -server vtctld-host:15999 OnlineDDL test_sharded_keyspace show recent
to check gh-ost job status, which changes fromqueued
torunning
tocomplete
on each shards (-80
and80-
). -
Run
show create table bar_entry\G
and see the new columnstatus
is present. -
VStream gRPC client received the following server-side error:
io.grpc.StatusRuntimeException: UNKNOWN: target: test_sharded_keyspace.80-.replica,
used tablet: zoneA-301 (prelive-ib-tablet-301.vt): vttablet:
rpc error: code = Unknown desc = stream (at source tablet)
error @ fa7c9236-2c16-11eb-8077-024d038b20ae:1,fac1535e-2c16-11eb-88b0-063538010254:1-1131111:
cannot determine table columns for bar_entry:
event has [8 254 17 17 8 8 8 15 246 254 246 1 2 246 3 3],
schema as [
name:"id" type:UINT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"id" column_length:20 charset:63 flags:49699
name:"a" type:ENUM table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"a" column_length:36 charset:33 flags:257
name:"created" type:TIMESTAMP table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"created" column_length:19 charset:63 flags:1153
name:"last_update" type:TIMESTAMP table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"last_update" column_length:19 charset:63 flags:9345
name:"b" type:INT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"b" column_length:20 charset:63 flags:53257
name:"c" type:UINT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"c" column_length:20 charset:63 flags:36897
name:"d" type:INT64 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"d" column_length:20 charset:63 flags:53257
name:"e" type:VARCHAR table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"e" column_length:765 charset:33 flags:20481
name:"f" type:DECIMAL table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"f" column_length:16 charset:63 decimals:4 flags:36865
name:"g" type:CHAR table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"g" column_length:9 charset:33 flags:4097
name:"h" type:DECIMAL table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"h" column_length:16 charset:63 decimals:4 flags:32769
name:"i" type:INT8 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"i" column_length:4 charset:63 flags:49152
name:"j" type:INT16 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"j" column_length:6 charset:63 flags:49153
name:"k" type:DECIMAL table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"k" column_length:16 charset:63 decimals:4 flags:32768
name:"l" type:UINT32 table:"bar_entry" org_table:"bar_entry" database:"vt_test_sharded_keyspace" org_name:"l" column_length:10 charset:63 flags:49184
]
Binary version
v8.0.0
7e09d0c
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (21 by maintainers)
@keweishang , thanks for the great test repo. I was able to reproduce the “cannot determine table columns” issue, even with the latest code. The issue with the internal tables created by gh-ost has been resolved in #7159, so it doesn’t appear now.
The cause is:
The default is to not run the tracker, so #1 doesn’t apply. When #2 is also not applicable, ie when we call VStream API only after the migration is complete, we are then dependent on #3, vttablet’s automatic reload. #4 is impractical for production use.
In our case the VStream API is called, with gtid set to “current”, before the periodic upload, The schema is then not in sync. This results in the schema-mismatch error that is thrown.
We discussed reloading the schema once Online DDL completes a migration. However we need to resolve a couple of things before we can do that
So this requires more thought and will not happen in the short-term.
The recommended way, at this time, is to enable the tracker in vttablet using
-track_schema_versions=true
There is an overhead of an additional vstreamer which will download the binlogs and do the minimal parsing required. Since it only deals with DDLs it is less than a regular vstream.
Whether it is perceptible depends on the server configuration and write QPS. This is precisely why we disable it by default. Originally it was enabled by default, but we had a few customers in production who were affected by it. (iirc) Those with lots of small servers + high QPS saw spikes in CPU usage when they migrated to that version.
The solution is for the tracker to be light-weight. I have done a quick POC by paring down the vstreamer functionality to a minimum and got over 60% reduction in cpu usage. To productionise it would however need a lot of testing since vstreamer would now follow different code paths based on whether it is a “lite” or regular version and vstreamers are in the core of vreplication. So it is not too high on our priority list at this moment. I will create an issue for this soon and if we find more support for it we can take it up earlier!
Hi @rohit-nayak-ps, sorry for the delay. Based on GA 8.0.0 docker image, I can repetitively reproduce the errors. I’ve created a public repo with README that has the steps to reproduce the errors: https://github.com/keweishang/schema_reload_error_test
Let me know if you manage to reproduce the error with the above repo setup. Thanks.
The workarounds I had discussed (while we wait for an automatic schema load post-migration) are:
vtctl ReloadSchemaKeyspace <keyspace>
to be manually run on the command line which forces all tablets in that keyspace to do a schema reload.Run a tracker which runs a vstream for schema tracking (which as a side-effect reloads the keyspace when it encounters a DDL). Since you are already running vstreams this does not apply. In any case, as I mentioned in a previous comment, there seems to be a bug where vstreams are NOT reloading the schema when a gh-ost rename occurs. Hope to make progress on this tomorrow.
Yes, assuming I understand correctly; specifically, we need to reload on the replica where vstream runs on.
@rohit-nayak-ps has a workaround meanwhile, I’ll update soon.