vitess: Bug Report: query error Table doesn't exist during DDL

Overview of the Issue

During online migration, while adding an index to a table, vttablet reported that the table was missing.

Reproduction Steps

  1. deploy a shared keyspace (commerce)
  2. deploy the following schema
create table users
(
    id char(36),
    team_id char(36),
    primary key (id)
) ENGINE=InnoDB;

create table users_keyspace_id
(
    `id`          char(36),
    `keyspace_id` varbinary(128) DEFAULT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB;
  1. deploy the following vschema
{
    "sharded": true,
    "vindexes": {
        "unicode_loose_md5": {
            "type": "unicode_loose_md5"
        },
        "users_keyspace_id": {
            "type": "consistent_lookup_unique",
            "params": {
                "batch_lookup": "true",
                "from": "id",
                "table": "commerce.users_keyspace_id",
                "to": "keyspace_id"
            },
            "owner": "users"
        }
    },
    "tables": {
        "users": {
            "column_vindexes": [
                {
                    "column": "team_id",
                    "name": "unicode_loose_md5"
                },
                {
                    "column": "id",
                    "name": "users_keyspace_id"
                }
            ]
        },
        "users_keyspace_id": {
            "column_vindexes": [
                {
                    "column": "id",
                    "name": "unicode_loose_md5"
                }
            ]
        }
    }
}
  1. Populate some records
while [ true ];
do
    mysql -h localhost -u user --port=15306 commerce -e 'INSERT INTO `users` (`id`, `team_id`) VALUES ("'$(uuidgen)'", "'$(uuidgen)'");'
done;
  1. Run select queries during an online DDL
set -e

while [ true ];
do
    mysql -h localhost -u user --port=15306 commerce -e "SELECT id, team_id from users LIMIT 1;"
done;
  1. while select queries are running run an online DDL on the table
SET @@ddl_strategy='vitess';
ALTER TABLE `users` ADD INDEX `team_id` (`team_id`);
  1. Check the terminal where you are running select queries and see the error
+--------------------------------------+--------------------------------------+
| id                                   | team_id                              |
+--------------------------------------+--------------------------------------+
| 0014C6BD-1211-4FD3-9334-6B3A93C1764B | DCC962EB-B28D-426B-82CE-A05749CDB08E |
+--------------------------------------+--------------------------------------+
ERROR 1146 (42S02) at line 1: target: commerce.-80.primary: vttablet: rpc error: code = NotFound desc = Table 'vt_commerce.users' doesn't exist (errno 1146) (sqlstate 42S02) (CallerID: userData1): Sql: "select id, team_id from users limit :__upper_limit", BindVars: {__upper_limit: "type:INT64 value:\"1\""vtg1: "type:INT64 value:\"1\""}
target: commerce.80-.primary: vttablet: rpc error: code = NotFound desc = Table 'vt_commerce.users' doesn't exist (errno 1146) (sqlstate 42S02) (CallerID: userData1): Sql: "select id, team_id from users limit :__upp

Binary Version

Version: 14.0.2 (Git revision 4d94e3f0beb3e88be6cb3d13d71f0d4b4e8155fb branch 'heads/v14.0.2') built on Thu Sep  1 16:36:09 UTC 2022 by vitess@buildkitsandbox using go1.18.5 linux/amd64

Operating System and Environment details

kubernetes / docker

Log Fragments

vtgate:


vtgate-55fdbd7b9b-7gxpg vtgate E0914 08:58:29.272856       1 vtgate.go:566] Execute: table _0ddcaaac_340b_11ed_9e79_1696691dad9f_20220914085601_vrepl not found, request: map[BindVariables:map[] Session:shard_sessions:{target:{keyspace:"commerce" shard:"-80" tablet_type:PRIMARY} tablet_alias:{cell:"local" uid:100012} reserved_id:1662978500765776829} autocommit:true options:{included_fields:ALL workload:OLTP} found_rows:1 system_variables:{key:"information_schema_stats_expiry" value:"0"} row_count:-1 in_reserved_conn:true DDLStrategy:"direct" SessionUUID:"66ccc6d6-340b-11ed-abb9-6e5e99b1868f" enable_system_settings:true Sql:SHOW FULL COLUMNS FROM `_0ddcaaac_340b_11ed_9e79_1696691dad9f_20220914085601_vrepl` FROM `commerce`]
vtgate-55fdbd7b9b-7gxpg vtgate E0914 08:59:11.510938       1 vtgate.go:566] Execute: table _0ddcaaac_340b_11ed_9e79_1696691dad9f_20220914085601_vrepl not found, request: map[BindVariables:map[] Session:shard_sessions:{target:{keyspace:"commerce" shard:"-80" tablet_type:PRIMARY} tablet_alias:{cell:"local" uid:100012} reserved_id:1662978500765776829} shard_sessions:{target:{keyspace:"commerce" shard:"80-" tablet_type:PRIMARY} tablet_alias:{cell:"local" uid:100022} reserved_id:1662978678874051671} autocommit:true options:{included_fields:ALL workload:OLTP} system_variables:{key:"information_schema_stats_expiry" value:"0"} row_count:-1 in_reserved_conn:true DDLStrategy:"direct" SessionUUID:"66ccc6d6-340b-11ed-abb9-6e5e99b1868f" enable_system_settings:true Sql:SHOW FULL COLUMNS FROM `_0ddcaaac_340b_11ed_9e79_1696691dad9f_20220914085601_vrepl` FROM `commerce`]

On one of the two vttablet replicas has error:

I0914 08:56:01.467224       1 engine.go:247] Streaming rows for query select /*vt+ ukColumns="id" */ id, created_at from users, lastpk: []
I0914 08:56:01.479515       1 rowstreamer.go:291] Streaming query: select id, created_at, updated_at, deleted_at, team_user_id, channel_id, root_id, original_id, message, type, convert(props using utf8mb4) as props, hashtags, filenames, file_ids, has_reactions, edited_at, is_pinned, remote_id from posts order by id
I0914 08:56:01.480336       1 snapshot_conn.go:79] Locking table users for copying
I0914 08:56:01.481668       1 snapshot_conn.go:72] Tables unlocked: users
I0914 08:56:24.259850       1 uvstreamer.go:367] Stream() called
I0914 08:56:24.260857       1 vstreamer.go:156] Starting Stream() with startPos MySQL56/07db6eff-131e-11ed-978d-9228ca51fa5b:1-36225,168a8edd-0e54-11ed-99fd-6a15d5d2c798:1-8695,1cb0b897-0e54-11ed-af8f-5624e4f96426:1-1050,3105e281-0e7b-11ed-aa86-42a21dda5028:1-16,3be0687d-141c-11ed-8164-3a96010de933:1-8379,3cb29991-141c-11ed-ac24-52a544a46af4:1-20142,3dbe2762-141c-11ed-9170-a60f748ff6ee:1-677058,68e3ea8f-0d93-11ed-a356-9ea073c9abdf:1-323,951ffdf2-0e7b-11ed-a932-6ebf3fa5bac5:1-67,a8a617a2-0e77-11ed-a4d0-7a76818acb77:1-60148,c0ed5c7c-08d0-11ed-95fb-22d1d9eca7a4:1-108681,c365feb1-141b-11ed-adb7-e62fce1d056a:1-14,cf7a1256-0121-11ed-b7b3-82f757afa399:1-916,e6771c39-0113-11ed-866a-3eae6ad16b36:1-4562,f26d6891-0d81-11ed-8033-5667702063a4:1-25343,f668268e-0125-11ed-bd3a-2ef8c6ae67db:1-214360
I0914 08:56:24.262871       1 binlog_connection.go:79] new binlog connection: serverID=974411462
I0914 08:56:24.262897       1 binlog_connection.go:125] sending binlog dump command: startPos=07db6eff-131e-11ed-978d-9228ca51fa5b:1-36225,168a8edd-0e54-11ed-99fd-6a15d5d2c798:1-8695,1cb0b897-0e54-11ed-af8f-5624e4f96426:1-1050,3105e281-0e7b-11ed-aa86-42a21dda5028:1-16,3be0687d-141c-11ed-8164-3a96010de933:1-8379,3cb29991-141c-11ed-ac24-52a544a46af4:1-20142,3dbe2762-141c-11ed-9170-a60f748ff6ee:1-677058,68e3ea8f-0d93-11ed-a356-9ea073c9abdf:1-323,951ffdf2-0e7b-11ed-a932-6ebf3fa5bac5:1-67,a8a617a2-0e77-11ed-a4d0-7a76818acb77:1-60148,c0ed5c7c-08d0-11ed-95fb-22d1d9eca7a4:1-108681,c365feb1-141b-11ed-adb7-e62fce1d056a:1-14,cf7a1256-0121-11ed-b7b3-82f757afa399:1-916,e6771c39-0113-11ed-866a-3eae6ad16b36:1-4562,f26d6891-0d81-11ed-8033-5667702063a4:1-25343,f668268e-0125-11ed-bd3a-2ef8c6ae67db:1-214360, serverID=974411462
I0914 08:57:21.233468       1 engine.go:403] schema engine created [_vt_HOLD_3e2a4bf2340b11ed98902e555bc4e1d9_20220915085721 _0ddcaaac_340b_11ed_9e79_1696691dad9f_20220914085601_vrepl], altered [], dropped [users]
I0914 08:57:22.189670       1 vstreamer.go:954] stream (at source tablet) ended @ 07db6eff-131e-11ed-978d-9228ca51fa5b:1-36225,168a8edd-0e54-11ed-99fd-6a15d5d2c798:1-8695,1cb0b897-0e54-11ed-af8f-5624e4f96426:1-1050,3105e281-0e7b-11ed-aa86-42a21dda5028:1-16,3be0687d-141c-11ed-8164-3a96010de933:1-8379,3cb29991-141c-11ed-ac24-52a544a46af4:1-20142,3dbe2762-141c-11ed-9170-a60f748ff6ee:1-677596,68e3ea8f-0d93-11ed-a356-9ea073c9abdf:1-323,951ffdf2-0e7b-11ed-a932-6ebf3fa5bac5:1-67,a8a617a2-0e77-11ed-a4d0-7a76818acb77:1-60148,c0ed5c7c-08d0-11ed-95fb-22d1d9eca7a4:1-108681,c365feb1-141b-11ed-adb7-e62fce1d056a:1-14,cf7a1256-0121-11ed-b7b3-82f757afa399:1-916,e6771c39-0113-11ed-866a-3eae6ad16b36:1-4562,f26d6891-0d81-11ed-8033-5667702063a4:1-25343,f668268e-0125-11ed-bd3a-2ef8c6ae67db:1-214360
I0914 08:57:22.189726       1 binlog_connection.go:279] closing binlog socket to unblock reads
I0914 08:57:22.189835       1 binlog_connection.go:153] connection closed during binlog stream (possibly intentional): read unix @->/vt/socket/mysql.sock: use of closed network connection
io.ReadFull(header size) failed (errno 2013) (sqlstate HY000)
I0914 08:57:22.189861       1 binlog_connection.go:287] waiting for binlog dump thread to end
I0914 08:57:22.189880       1 binlog_connection.go:293] closing binlog MySQL client with serverID 974411462. Will recycle ID.
I0914 08:58:30.478767       1 engine.go:247] Streaming rows for query select /*vt+ ukColumns="id" */ id, created_at from users, lastpk: []
I0914 08:58:30.479099       1 engine.go:490] table users not found in vttablet schema, current tables: map[_0ddcaaac_340b_11ed_9e79_1696691dad9f_20220914085601_vrepl:0xc0025a81c0 _vt_HOLD_3e2a4bf2340b11ed98902e555bc4e1d9_20220915085721:0xc0025a8070 channel_team_user:0xc0001d2d90 channel_team_user_keyspace_id:0xc0001d27e0 channel_team_user_keyspace_team_user_id:0xc001376af0 channels:0xc001376c40 conferences:0xc0001d2fc0 conferences_keyspace_id:0xc0001d24d0 dual:0xc001376930 emoji:0xc0001d3f80 emoji_keyspace_id:0xc0001d2690 emoji_name_keyspace_idx:0xc0001d2930 file_infos:0xc001376d90 file_infos_keyspace_id:0xc0001d3b20 group_team_user:0xc0002b6380 groups:0xc0002b6150 incoming_webhooks:0xc001376ee0 post_reactions:0xc001377030 posts_keyspace_id:0xc001377180 preferences:0xc0013772d0 preferences_keyspace_id:0xc0001d3dc0 role_team_user:0xc001377420 roles:0xc0001d33b0 sessions:0xc0001d3960 sidebar_categories:0xc001377570 sidebar_channels:0xc0013776c0 slack_tokens:0xc001377810 statuses:0xc001377960 storages:0xc001377ab0 team_name_keyspace_idx:0xc0001d2a80 team_user:0xc0001d35e0 team_user_keyspace_id:0xc001377c00 teams:0xc0001d2bd0 thread_team_user:0xc001377d50 thread_team_user_keyspace_team_user_id:0xc001377f10 threads:0xc0001d3730]
I0914 08:58:30.519712       1 engine.go:403] schema engine created [users _66be19d9_340b_11ed_9e79_1696691dad9f_20220914085830_vrepl], altered [_0ddcaaac_340b_11ed_9e79_1696691dad9f_20220914085601_vrepl], dropped [_vt_HOLD_3e2a4bf2340b11ed98902e555bc4e1d9_20220915085721]
I0914 08:58:30.533167       1 rowstreamer.go:291] Streaming query: select id, created_at, updated_at, deleted_at, team_user_id, channel_id, root_id, original_id, message, type, convert(props using utf8mb4) as props, hashtags, filenames, file_ids, has_reactions, edited_at, is_pinned, remote_id from posts order by id
I0914 08:58:30.533804       1 snapshot_conn.go:79] Locking table users for copying
I0914 08:58:30.534938       1 snapshot_conn.go:72] Tables unlocked: users
I0914 08:58:55.018416       1 uvstreamer.go:367] Stream() called
I0914 08:58:55.019536       1 vstreamer.go:156] Starting Stream() with startPos MySQL56/07db6eff-131e-11ed-978d-9228ca51fa5b:1-36225,168a8edd-0e54-11ed-99fd-6a15d5d2c798:1-8695,1cb0b897-0e54-11ed-af8f-5624e4f96426:1-1050,3105e281-0e7b-11ed-aa86-42a21dda5028:1-16,3be0687d-141c-11ed-8164-3a96010de933:1-8379,3cb29991-141c-11ed-ac24-52a544a46af4:1-20142,3dbe2762-141c-11ed-9170-a60f748ff6ee:1-677726,68e3ea8f-0d93-11ed-a356-9ea073c9abdf:1-323,951ffdf2-0e7b-11ed-a932-6ebf3fa5bac5:1-67,a8a617a2-0e77-11ed-a4d0-7a76818acb77:1-60148,c0ed5c7c-08d0-11ed-95fb-22d1d9eca7a4:1-108681,c365feb1-141b-11ed-adb7-e62fce1d056a:1-14,cf7a1256-0121-11ed-b7b3-82f757afa399:1-916,e6771c39-0113-11ed-866a-3eae6ad16b36:1-4562,f26d6891-0d81-11ed-8033-5667702063a4:1-25343,f668268e-0125-11ed-bd3a-2ef8c6ae67db:1-214360
I0914 08:58:55.020439       1 binlog_connection.go:79] new binlog connection: serverID=974411462
I0914 08:58:55.020471       1 binlog_connection.go:125] sending binlog dump command: startPos=07db6eff-131e-11ed-978d-9228ca51fa5b:1-36225,168a8edd-0e54-11ed-99fd-6a15d5d2c798:1-8695,1cb0b897-0e54-11ed-af8f-5624e4f96426:1-1050,3105e281-0e7b-11ed-aa86-42a21dda5028:1-16,3be0687d-141c-11ed-8164-3a96010de933:1-8379,3cb29991-141c-11ed-ac24-52a544a46af4:1-20142,3dbe2762-141c-11ed-9170-a60f748ff6ee:1-677726,68e3ea8f-0d93-11ed-a356-9ea073c9abdf:1-323,951ffdf2-0e7b-11ed-a932-6ebf3fa5bac5:1-67,a8a617a2-0e77-11ed-a4d0-7a76818acb77:1-60148,c0ed5c7c-08d0-11ed-95fb-22d1d9eca7a4:1-108681,c365feb1-141b-11ed-adb7-e62fce1d056a:1-14,cf7a1256-0121-11ed-b7b3-82f757afa399:1-916,e6771c39-0113-11ed-866a-3eae6ad16b36:1-4562,f26d6891-0d81-11ed-8033-5667702063a4:1-25343,f668268e-0125-11ed-bd3a-2ef8c6ae67db:1-214360, serverID=974411462
I0914 08:59:12.639627       1 engine.go:403] schema engine created [_vt_HOLD_80935282340b11ed98902e555bc4e1d9_20220915085912], altered [_66be19d9_340b_11ed_9e79_1696691dad9f_20220914085830_vrepl], dropped [users]
E0914 08:59:13.233390       1 tabletserver.go:1412] Table 'vt_commerce.users' doesn't exist (errno 1146) (sqlstate 42S02) (CallerID: user): Sql: "select id from users where id = :vtg1 order by created_at desc limit :vtg3", BindVars: {vtg1: "type:VARCHAR value:\"2187d1b6-8ccd-4062-b28e-6b13dc11305d\""vtg2: "type:VARCHAR value:\"2022-09-14 08:59:13.038\""vtg3: "type:INT64 value:\"1\""}
I0914 08:59:13.603334       1 vstreamer.go:954] stream (at source tablet) ended @ 07db6eff-131e-11ed-978d-9228ca51fa5b:1-36225,168a8edd-0e54-11ed-99fd-6a15d5d2c798:1-8695,1cb0b897-0e54-11ed-af8f-5624e4f96426:1-1050,3105e281-0e7b-11ed-aa86-42a21dda5028:1-16,3be0687d-141c-11ed-8164-3a96010de933:1-8379,3cb29991-141c-11ed-ac24-52a544a46af4:1-20142,3dbe2762-141c-11ed-9170-a60f748ff6ee:1-678132,68e3ea8f-0d93-11ed-a356-9ea073c9abdf:1-323,951ffdf2-0e7b-11ed-a932-6ebf3fa5bac5:1-67,a8a617a2-0e77-11ed-a4d0-7a76818acb77:1-60148,c0ed5c7c-08d0-11ed-95fb-22d1d9eca7a4:1-108681,c365feb1-141b-11ed-adb7-e62fce1d056a:1-14,cf7a1256-0121-11ed-b7b3-82f757afa399:1-916,e6771c39-0113-11ed-866a-3eae6ad16b36:1-4562,f26d6891-0d81-11ed-8033-5667702063a4:1-25343,f668268e-0125-11ed-bd3a-2ef8c6ae67db:1-214360
I0914 08:59:13.603440       1 binlog_connection.go:279] closing binlog socket to unblock reads
I0914 08:59:13.603573       1 binlog_connection.go:287] waiting for binlog dump thread to end
I0914 08:59:13.603598       1 binlog_connection.go:153] connection closed during binlog stream (possibly intentional): read unix @->/vt/socket/mysql.sock: use of closed network connection
io.ReadFull(header size) failed (errno 2013) (sqlstate HY000)
I0914 08:59:13.603649       1 binlog_connection.go:293] closing binlog MySQL client with serverID 974411462. Will recycle ID.
I0914 08:59:28.034976       1 engine.go:403] schema engine created [users], altered [_66be19d9_340b_11ed_9e79_1696691dad9f_20220914085830_vrepl], dropped [_vt_HOLD_80935282340b11ed98902e555bc4e1d9_20220915085912]

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 31 (19 by maintainers)

Most upvoted comments

Update from a discussion with @harshit-gangal: we’re exploring two different solutions:

  1. Looks like the error comes from vttablet and related to query buffering, and doesn’t have to do with vschema@harshit-gangal 's reproduction is without vschema at all. So we should investigate why the queries are routed t othe server inspite of buffering.
  2. Working on an atomic cut-over. Current WIP is https://github.com/vitessio/vitess/pull/11460 with more writeup coming. With https://github.com/vitessio/vitess/pull/11460 I’m unable to reproduce the issue, which is trivial because the table never goes away.

More updates to come.

I have not yet figured out the issue. But, the issue is unrelated to vindexes.

Another test that report query errors.

  1. Take 2 unrelated tables.
  2. Insert data into both tables.
  3. Follow step 6 to 8 as reported in the issue.

What I noticed is that there should be at least one more table that should contain the data. In the test case provided above that was fulfilled by the lookup vindex table.

Test changes: https://github.com/planetscale/vitess/pull/new/onlineddl-qs

I didn’t try with replicas in my local testing, but we reproduced the error in a preproduction environment both with replicas and primary. And I agree, the error is only a few seconds and is gone, but can result in many failed query in this short period.

I was able to reproduce on Primary, with vschema, and I suspect you are right, this only reproduces with vschema.

The other observation from my experiment is that the errors show up for a short period of time (a couple seconds?) and then don’t show up anymore – but that happens per vtgate connection-ish, I’m not sure yet whether necessarily every new vtgate connection suffers from this, or just ones that happened to be created at the “wrong” time.

Oh that is interesting. On the primary it would take a different scenario for the error to manifest. I should only happen if Vitess/vtgate run out of query buffering capacity. Per your reproduction steps, I don’t actually think we should hit that case. So this is something to investigate.

Nice work, @L3o-pold! Thank you for that. I’ll start a discussion on this and continue the investigation. I will let you know when I have any updates.

Unfortunately no but I will try to reproduce with a simple local example.