tidb: analyze table failed for table with charset latin1
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
mysql> create table t (v1 varchar(30)) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin;
Query OK, 0 rows affected (0.09 sec)
$ python2.7
>>> f = open("1.sql", "w")
>>> f.write('INSERT INTO `t` VALUES ("\xe4NKNO\xe6");\n')
>>> f.flush()
$ mysql -h 172.16.4.18 -uroot -P4000 -D t < 1.sql
mysql> select * from t;
+--------+
| v1 |
+--------+
| �NKNO� |
+--------+
1 row in set (0.00 sec)
mysql >
mysql > analyze table t;
ERROR 1105 (HY000): other error: encoding failed
If change the table schema without charset and collate, then we will fail at insert phase with:
mysql> create table t (v1 varchar(30));
Query OK, 0 rows affected (0.09 sec)
[root@172.16.4.92 ontime2]# mysql -h 172.16.4.18 -uroot -P4000 -D t < 1.sql
ERROR 1366 (HY000) at line 1: incorrect utf8 value e44e4b4e4fe6(�NKNO�) for column v1
This issue is originally found by @nullnotnil when running tidb-lightning
, releate issue https://github.com/pingcap/tidb-lightning/issues/351, And when I try to reproduce that issue, I found it should be a tidb issue.
2. What did you expect to see? (Required)
analyze table t
should success
3. What did you see instead (Required)
ERROR 1105 (HY000): other error: encoding failed
4. Affected version (Required)
$ ./tikv-server -V
TiKV
Release Version: 4.1.0-alpha
Edition: Community
Git Commit Hash: 8b1fc4fc67f6d74a46a86d731eb5c152cbf0dfa8
Git Commit Branch: master
UTC Build Time: 2020-07-14 01:06:28
Rust Version: rustc 1.46.0-nightly (16957bd4d 2020-06-30)
Enable Features: jemalloc portable sse protobuf-codec
Profile: dist_release
mysql> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-771-gca41972fb
Edition: Community
Git Commit Hash: ca41972fbac068c8a5de107d9075f09ac68842ac
Git Branch: master
UTC Build Time: 2020-07-14 02:41:21
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)
5. Root Cause Analysis
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (16 by maintainers)
I add some trace to the
Analyze
code and found that theTiDB
pushes down some wrong information about the collation of the string. It should belatin1
instead, theTiKV
receives aUtf8Mb4BinNoPadding
:@wjhuang2016 PTAL uses python2 to generate the
SQL
and it can reproduce in master branch.I not sure if it is necessary. In fact, we shouldn’t allow writing non-UTF8 bytes to TiKV.
Seems this is a issue in tikv-side.
I deploy a cluster with
v4.0.2
by ansible. This issue cannot be reproduced. tikv reversion is:After I manually update tikv version to the latest commit in master, this issue appears. TiKV version:
BTW, use sql
INSERT INTO t VALUES (UNHEX('C3A44E4B4E4FC3A6'));
can not reproduce either. By now, the only way to reproduce this issue is to generate a sql filed and usemysql < xxx.sql
I can reproduce on a fresh server with a shell script. But there seems to be some sort of flakyness to it. If I mysqldump and restore and then analyze that, it doesn’t reproduce it, and this script also stops reproducing it.