clickhouse-sink-connector: [sink-connector-lightweight] Initial snapshot slows down and then OOM
Good day.
When creating initial snapshot everything starts fast but then slows down considerably. It manages to export a couple of million of rows no problem and then this happens:
74164 2023-10-05 17:52:41.416 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 347670 of 2867292 records for table 'schema.table' after 00:00:10.009
84165 2023-10-05 17:52:51.417 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 628341 of 2867292 records for table 'schema.table' after 00:00:20.01
96286 2023-10-05 17:53:03.538 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 843087 of 2867292 records for table 'schema.table' after 00:00:32.131
96297 2023-10-05 17:53:03.549 [pool-271-thread-1] INFO com.altinity.clickhouse.sink.connector.db.DbWriter - *** QUERY***insert into table(columns)
106292 2023-10-05 17:53:13.544 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1064731 of 2867292 records for table 'schema.table' after 00:00:42.137
116617 2023-10-05 17:53:23.869 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1109216 of 2867292 records for table 'schema.table' after 00:00:52.462
127343 2023-10-05 17:53:34.595 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1156627 of 2867292 records for table 'schema.table' after 00:01:03.188
138743 2023-10-05 17:53:45.995 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1162867 of 2867292 records for table 'schema.table' after 00:01:14.588
150097 2023-10-05 17:53:57.349 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1167905 of 2867292 records for table 'schema.table' after 00:01:25.942
161378 2023-10-05 17:54:08.630 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1169915 of 2867292 records for table 'schema.table' after 00:01:37.223
164166 2023-10-05 17:54:11.418 [main] INFO io.debezium.connector.common.BaseSourceTask - 749911 records sent during previous 00:01:27.143, last recorded offset of {server=embeddedconnector} partition is {ts_sec=1696528351, file=mysql-bin-changelog.000517, pos=157, snapshot=true}
181053 2023-10-05 17:54:28.305 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1171289 of 2867292 records for table 'schema.table' after 00:01:56.898
192186 2023-10-05 17:54:39.438 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1171670 of 2867292 records for table 'schema.table' after 00:02:08.031
222901 2023-10-05 17:55:10.153 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1172039 of 2867292 records for table 'schema.table' after 00:02:38.746
234068 2023-10-05 17:55:21.320 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1172245 of 2867292 records for table 'schema.table' after 00:02:49.913
245211 2023-10-05 17:55:32.463 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1172488 of 2867292 records for table 'schema.table' after 00:03:01.056
256355 2023-10-05 17:55:43.607 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1172644 of 2867292 records for table 'schema.table' after 00:03:12.2
267478 2023-10-05 17:55:54.730 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1172871 of 2867292 records for table 'schema.table' after 00:03:23.323
279701 2023-10-05 17:56:06.953 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1173115 of 2867292 records for table 'schema.table' after 00:03:35.546
293680 2023-10-05 17:56:20.932 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1173426 of 2867292 records for table 'schema.table' after 00:03:49.525
304822 2023-10-05 17:56:32.074 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1173970 of 2867292 records for table 'schema.table' after 00:04:00.667
315965 2023-10-05 17:56:43.217 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1174096 of 2867292 records for table 'schema.table' after 00:04:11.81
327084 2023-10-05 17:56:54.336 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1174406 of 2867292 records for table 'schema.table' after 00:04:22.929
338216 2023-10-05 17:57:05.468 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1174680 of 2867292 records for table 'schema.table' after 00:04:34.061
349328 2023-10-05 17:57:16.580 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1174997 of 2867292 records for table 'schema.table' after 00:04:45.173
366045 2023-10-05 17:57:33.297 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1175345 of 2867292 records for table 'schema.table' after 00:05:01.89
380959 2023-10-05 17:57:48.211 [debezium-mysqlconnector-embeddedconnector-change-event-source-coordinator] INFO io.debezium.relational.RelationalSnapshotChangeEventSource - Exported 1175675 of 2867292 records for table 'schema.table' after 00:05:16.804`
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "HTTP-Dispatcher"
After that snapshot is restarted the connector starts to tank even earlier, after ~200K rows.
Both MySQL and Clickhouse instances have negligible load, but the connector instance has 100% CPU load and consumes full RAM constantly even when pushing tens of rows per second.
What I tried:
- I’ve tried to use
-Xmx6144M
and-Xmx10240M
on a 16GB instance without any noticeable effect (i.e. both of them tank and OOM) - Tried to tweak parameters like
snapshot.max.threads, offset.flush.interval.ms, offset.flush.timeout.ms, max.request.size, max.queue.size, max.batch.size
with no effect (not even sure if all of them work for this jar file) - Tried to change offset storage to
org.apache.kafka.connect.storage.FileOffsetBackingStore
, no effect (except removingthe transactions not supported
log). I’ve also seenTimed out waiting to flush EmbeddedEngine{id="engine"} offsets to storage
even with file storage, which should not happen normally I think. - Tried to swtich to 0.6, same
- Java 11, Java 17 - same
Thanks in advance.
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 19 (7 by maintainers)
I was facing the same issue and tuning using
snapshot.fetch.size
inconfig.yml
allowed me to throttle the snapshot creation which solved my issue.@AdamKatzDev, I have faced with this issue too. The reason is that method put() in ClickHouseSinkTask receives records faster than ClickhouseBatchRunnable processes them. I have solved that issue by adding a code to control a buffer filling. If limit of records in a buffer (buffer is the “records” field in the ClickHouseSinkTask class) is reached, method put() continuously polls its condition until it’s free for a new batch.
Hi @AdamKatzDev , this is the updated path - https://github.com/Altinity/clickhouse-sink-connector/tree/develop/sink-connector/python
In order to initialize large databases, you may need to take a snapshot from MySQL and restore it to CH. This project has all the tooling to do it