tidb: Incorrect row count estimation in TiDB 2.1.1

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

    Create a table with a composite primary key.

CREATE TABLE `table1` (
`column_1` bigint(20) NOT NULL,
`column_2` int(11) NOT NULL,
`column_3` varchar(10) DEFAULT NULL,
`column_4` bigint(20) DEFAULT NULL,
`column_5` varchar(20) DEFAULT NULL,
`column_6` varchar(10) DEFAULT NULL,
`column_7` int(11) DEFAULT NULL,
`column_8` varchar(4) DEFAULT NULL,
`column_9` int(11) DEFAULT NULL,
`column_10` varchar(50) DEFAULT NULL,
`column_11` smallint(6) DEFAULT NULL,
`column_12` varchar(4000) DEFAULT NULL,
`column_13` decimal(10,2) DEFAULT NULL,
`column_14` decimal(10,2) DEFAULT NULL,
`column_15` decimal(10,2) DEFAULT NULL,
`column_16` varchar(1) DEFAULT NULL,
`column_17` datetime NOT NULL,
`column_18` varchar(200) DEFAULT NULL,
`column_19` varchar(2) DEFAULT NULL,
`column_20` text DEFAULT NULL,
`column_21` varchar(255) DEFAULT NULL,
`column_22` text DEFAULT NULL,
`column_23` varchar(50) DEFAULT NULL,
`column_24` varchar(20) DEFAULT NULL,
`column_25` bigint(20) DEFAULT NULL,
`column_26` varchar(1500) DEFAULT NULL,
`column_27` int(11) DEFAULT NULL,
`column_28` varchar(1000) DEFAULT NULL,
`column_29` text DEFAULT NULL,
`column_30` text DEFAULT NULL,
`column_31` decimal(10,2) DEFAULT NULL,
`column_32` decimal(10,2) DEFAULT NULL,
`column_33` decimal(10,2) DEFAULT NULL,
`column_34` decimal(10,2) DEFAULT NULL,
`column_35` datetime DEFAULT NULL,
`column_36` text DEFAULT NULL,
`column_37` varchar(5) DEFAULT NULL,
`column_38` int(11) DEFAULT NULL,
`column_39` varchar(1) DEFAULT NULL,
`column_40` varchar(1) DEFAULT NULL,
`column_41` varchar(5) DEFAULT NULL,
`column_42` int(11) DEFAULT NULL,
`column_43` varchar(5) DEFAULT NULL,
`column_44` decimal(10,2) DEFAULT NULL,
`column_45` decimal(10,2) DEFAULT NULL,
`column_46` int(11) DEFAULT NULL,
`column_47` decimal(10,2) DEFAULT NULL,
`column_48` varchar(5) DEFAULT NULL,
`column_49` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`column_50` decimal(10,2) DEFAULT NULL,
`column_51` decimal(10,2) DEFAULT NULL,
`column_52` decimal(10,2) DEFAULT NULL,
`column_53` decimal(10,2) DEFAULT NULL,
`column_54` varchar(2) DEFAULT NULL,
`column_55` varchar(50) DEFAULT NULL,
`column_56` datetime DEFAULT NULL,
`column_57` varchar(500) DEFAULT NULL,
`column_58` varchar(5) DEFAULT NULL,
`column_59` varchar(100) DEFAULT NULL,
`column_60` varchar(10) DEFAULT NULL,
`column_61` datetime DEFAULT NULL,
`column_62` varchar(100) DEFAULT NULL,
`column_63` varchar(20) DEFAULT NULL,
`column_64` varchar(5) DEFAULT NULL,
`column_65` varchar(10) DEFAULT NULL,
`column_66` varchar(10) DEFAULT NULL,
`column_67` decimal(10,2) DEFAULT NULL,
`column_68` decimal(10,2) DEFAULT NULL,
`column_69` decimal(10,2) DEFAULT NULL,
`column_70` decimal(10,2) DEFAULT NULL,
`column_71` decimal(10,0) DEFAULT NULL,
`column_72` decimal(10,0) DEFAULT NULL,
`column_73` decimal(10,0) DEFAULT NULL,
`column_74` varchar(10) DEFAULT NULL,
`column_75` int(11) DEFAULT NULL,
`column_76` varchar(30) DEFAULT NULL,
`column_77` int(11) DEFAULT NULL,
`column_78` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`column_1`,`column_2`),
  KEY `column_17` (`column_17`),
  KEY `column_49` (`column_49`),
  KEY `column_1` (`column_1`),
  KEY `column_40` (`column_40`),
  KEY `column_9` (`column_9`),
  KEY `column_43` (`column_43`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Run query explain select column1, column2 from `table1` where (column1=01092213 AND column2=1) OR (column1=01091904 AND column2=2) OR (column1=01091970 AND column2=2) OR (column1=01092706 AND column2=1) OR (column1=01092355 AND column2=1) OR (column1=01092578 AND column2=2) OR (column1=01092623 AND column2=4) OR (column1=01092651 AND column2=1) OR (column1=01092369 AND column2=2) OR (column1=01092365 AND column2=1) OR (column1=01092238 AND column2=2) OR (column1=01092799 AND column2=1) OR (column1=01092770 AND column2=3) OR (column1=01092363 AND column2=2) OR (column1=01092365 AND column2=2) OR (column1=01093579 AND column2=1) OR (column1=01093051 AND column2=2) OR (column1=01093032 AND column2=1) OR (column1=01093100 AND column2=2) OR (column1=01093534 AND column2=1) OR (column1=01093432 AND column2=1) OR (column1=01092914 AND column2=1) OR (column1=01092861 AND column2=2) OR (column1=01093054 AND column2=2) OR (column1=01093298 AND column2=1) OR (column1=01093293 AND column2=3);

  1. What did you expect to see? Running explain on TiDB 2.0.8 returns screen shot 2019-01-03 at 12 13 10 pm

  2. What did you see instead? Running explain on TiDB 2.1.1 returns a full table scan screen shot 2019-01-03 at 12 16 05 pm

  3. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

    TiDB v2.1.1

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (12 by maintainers)

Most upvoted comments

The behavior change is caused by these lines in deriveIndexPathStats:

458     // If the `countAfterAccess` is less than `stats.RowCount`, there must be some inconsistent stats info.
459     // We prefer the `stats.RowCount` because it could use more stats info to calculate the selectivity.
460     if path.countAfterAccess < ds.stats.RowCount {
461         path.countAfterAccess = math.Min(ds.stats.RowCount/selectionFactor, float64(ds.statisticTable.Count))
462     }

this part is different in 2.0.8 and 2.1.1. @Guru107 thanks for the report, we will try to improve it.