uReplicator: Urgent - Worker restart , always have a number of duplicate message in topic

Hi everyone , I’m implementing uReplicator with latest master branch ( commit 70ac3404bb4517756c6813d3962b77edc7700019 ). As the another commit on master , after Worker restart, always have lots of message duplicate in Destination topic .

Here is overview my step :

step1: Build the latest code

step2: build Docker image with this Dockerfile

FROM openjdk:8-jre

COPY confd-0.15.0-linux-amd64 /usr/local/bin/confd
COPY uReplicator/uReplicator-Distribution/target/uReplicator-Distribution-pkg /uReplicator
COPY uReplicator/config uReplicator/config
COPY confd-new /etc/confd
#I use confd to generate configuration file in Container 
COPY entrypoint-new.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && \
    chmod +x /usr/local/bin/confd && \
    chmod +x /uReplicator/bin/*.sh && 

ENTRYPOINT [ "/entrypoint.sh" ]

Step3: prepare 2 topic:

  • Source topic : 500 message & keep rising to target 1000 message.
  • Destination topic : Blank topic - 0 message

step4: apply deployment on Kubernetes with config :

  • Controller : 1 pod : resource require: 100m CPU - 1500Mi memory
  • Worker : 2 pod : resource require: 100m CPU - 1500Mi memory

clusters.properties

kafka.cluster.zkStr.cluster1=src-zk:2181
kafka.cluster.servers.cluster1=src-kafka:9092
kafka.cluster.zkStr.cluster2=dst-zk:2181
kafka.cluster.servers.cluster2=dst-kafka:9092

consumer.properties

zookeeper.connect=src-zk:2181
bootstrap.servers=src-kafka:9092
zookeeper.connection.timeout.ms=30000
zookeeper.session.timeout.ms=30000
group.id=kloak-mirrormaker-test
consumer.id=kloakmms01-sjc1
socket.receive.buffer.bytes=1048576
fetch.message.max.bytes=8388608
queued.max.message.chunks=5
key.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer
value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer
auto.offset.reset=smallest

dstzk.properties

enable=true
zkServer=src-zk:2181

helix.properties : disable instanceId because of i’m using more than 1 Worker

zkServer=src-zk:2181
#instanceId=testHelixMirrorMaker01
helixClusterName=testMirrorMaker
federated.deployment.name=uReplicator-Tests

producer.properties

bootstrap.servers=dst-kafka:9092
client.id=kloak-mirrormaker-test

Step4: check number of message on both server

Src & Dst topic : 600 message

Step5 : Delete 1 worker POD kubernetes by : kubectl delete pod … ( no --force command )

result : 1 - old POD is Terminating. Src & Dst keep sync up on time : 750 -750 on both topic

2 - a few second later: new POD is UP Src topic : keep rising to 900 -> 95- -> 1000 Dst topic : stay at 750 message

3 - a minutes later : Src topic : 1000 message Dst topic : more than 1000 message : sometime 1200 message , sometime 1100-1300…etc ( the result is different in every test )

Note : if no Worker is down. The replication result is perfect . But in this case in everytime is test , the result is different and uReplicator is not stable. @yangy0000 @Pawka @DAddYE @hparra @chirayuk @dentafrice CC to anyone Who contributed to this application . Please let me know/correct me if something went wrong .

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

yes ,perfect .

@Technoboy- hey bro , the problem is : kill process or kill -9 process . I did this way with kill -9

I’ve tried with ‘kill processId’ and here is result : have commit offset log? And It’s Work !!!

2019-10-29T07:45:23.181+0000: Total time for which application threads were stopped: 0.0002563 seconds, Stopping threads took: 0.0000400 seconds
2019-10-29T07:45:40.183+0000: Total time for which application threads were stopped: 0.0001666 seconds, Stopping threads took: 0.0000360 seconds
[2019-10-29 07:45:45,083] INFO Start clean shutdown. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:45,086] INFO Shutting down consumer thread. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:45,089] INFO [mirrormaker-thread-HelixMirrorMaker-1572335099430] mirrormaker-thread-HelixMirrorMaker-1572335099430 shutting down (kafka.mirrormaker.WorkerInstance$MirrorMakerThread:66)
2019-10-29T07:45:45.184+0000: Total time for which application threads were stopped: 0.0001369 seconds, Stopping threads took: 0.0000343 seconds
2019-10-29T07:45:51.185+0000: Total time for which application threads were stopped: 0.0001361 seconds, Stopping threads took: 0.0000360 seconds
[2019-10-29 07:45:51,811] INFO [mirrormaker-thread-HelixMirrorMaker-1572335099430] Mirror maker thread stopped (kafka.mirrormaker.WorkerInstance$MirrorMakerThread:66)
[2019-10-29 07:45:51,811] INFO [mirrormaker-thread-HelixMirrorMaker-1572335099430] Mirror maker thread shutdown complete (kafka.mirrormaker.WorkerInstance$MirrorMakerThread:66)
[2019-10-29 07:45:51,811] INFO Flushing last batches and commit offsets. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,812] INFO Flushing producer. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,816] INFO Committing offsets. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,823] INFO Shutting down consumer connectors. (kafka.mirrormaker.WorkerInstance:66)
[2019-10-29 07:45:51,823] INFO Connector is now shutting down! (kafka.mirrormaker.KafkaConnector:66)
[2019-10-29 07:45:51,838] INFO [CompactConsumerFetcherManager-1572335099655] Stopping leader finder thread (kafka.mirrormaker.CompactConsumerFetcherManager:66)
[2019-10-29 07:45:51,839] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Shutting down (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 07:45:51,839] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Stopped (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 07:45:51,839] INFO [kloak-mirrormaker-dc-test2_kloakmms01-dc-leader-finder-thread]: Shutdown completed (kafka.mirrormaker.CompactConsumerFetcherManager$LeaderFinderThread:66)
[2019-10-29 07:45:51,840] INFO [CompactConsumerFetcherManager-1572335099655] Stopping all fetchers (kafka.mirrormaker.CompactConsumerFetcherManager:66)
[2019-10-29 07:45:51,844] INFO [CompactConsumerFetcherManager-1572335099655] All connections stopped (kafka.mirrormaker.CompactConsumerFetcherManager:66)

kill process.

search worker log, find key words like “Start clean shutdown” or “Flushing last batches and commit offsets” which mean gracefully shutdown。under gracefully shutdown, message offset will commit。otherwise, you may kill the pod with incorrect manner