risingwave: PG cdc `q3` checksums inconsistent
Describe the bug
pg-cdc q3 checksums failed
https://buildkite.com/risingwave-test/chaos-mesh/builds/552
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
nightly-20240207
Additional context
No response
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 16 (11 by maintainers)
I am able to successfully repro with:
Buildkite Grafana
Findings:
customer,new_order,orders,order_line) related toch_benchmark_q3matches with PG.ch_benchmark_q3in RW has more rows than PG.ch_benchmark_q3_2with the same SQL when there is no source throughput.ch_benchmark_q3_2has correct row count so I cross check the internal state table row count one by one:ch_benchmark_q3andch_benchmark_q3_2:__internal_ch_benchmark_q3_99_hashjoinleft_1121__internal_ch_benchmark_q3_99_hashjoinleft_1121is the one and the only one table that has triggered memtable spill.__internal_ch_benchmark_q3_99_hashjoinleft_1121and use sst-dump to grep the occurrences of its key in all relevant SSTs. I only found one occurrence in a L6 SST. There is no other PUT/DELETE for the key in sst-dump.See more details in debug_notes.txt. The cluster is not cleaned up. For those who are interested, you can psql into the cluster to get more information.
Now I am re-running the same test with the following config. Hopefully we can get the full mutation history for a row in hummock:
The source tables are synced completed. Something wrong with the query
q3.q3:The test passed.
pod-failurewill cause pods restart.