tiflow: [TI-CDC] Using canal JSON to output to Kafka, different primary key types produce inconsistent results

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1.Deploy tidb and ticdc 2.Create a table with int type as primary key

CREATE TABLE `int_id_table` (
  `int_id` int(11) NOT NULL,
  `var1` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`int_id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

3.Create another table with varchar type as primary key

CREATE TABLE `varchar_id_table` (
  `varchar_id` varchar(32) NOT NULL,
  `var2` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`varchar_id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

4.Create a ticdc synchronization task to output the change logs of the above two tables to kakfa

{"changefeed_id":"test4","sink_uri":"kafka://xxxxxxx:9092/ticdc-test4?kafka-version=2.5.0&protocol=canal-json",
"filter_rules":["test.int_id_table","test.varchar_id_table"]}

5.Insert data into the above two tables, and then update their primary keys

insert into int_id_table value ('1','1');
insert into varchar_id_table value ('2','2');
update int_id_table set int_id = '3' where int_id = '1';
update varchar_id_table set varchar_id = '4' where varchar_id = '2';

2. What did you expect to see? (Required)

The log of primary key update is split into delete log and insert log, which is convenient for downstream processing with flink.

3. What did you see instead (Required)

The int type primary key change log is split, but the varchar type primary key log is not split.

4. What is your TiDB version? (Required)

Release Version: v5.4.0 Edition: Community Git Commit Hash: 55f3b24c1c9f506bd652ef1d162283541e428872 Git Branch: heads/refs/tags/v5.4.0 UTC Build Time: 2022-01-25 08:39:26 GoVersion: go1.16.4 Race Enabled: false TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306 Check Table Before Drop: false

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 19 (12 by maintainers)

Most upvoted comments

At present, I can only avoid the synchronization task of varchar primary key. Although this is caused by the mechanism of Flink, I still expect ticdc to unify the behavior of outputting logs through the adjustment of parameter configuration, which is very useful for users to maintain a consistent experience

By the way, I have also tried to use the Ti CDC connector by Flink Chinese community to complete such work, but I found that is not stable enough. I have also mentioned such problems in relevant communities (https://github.com/ververica/flink-cdc-connectors/issues/1154).

coralzu on Jul 19, 2022