confluent-kafka-dotnet: regex subscription causes librdkafka to fault
I’m getting reproducible faulting in librdkafka.dll on the consumer side with 0.11.3:
Faulting module name: librdkafka.dll Exception code: 0xc0000409 Fault offset: 0x00111c45
- This has only started happening since we started tapping into a live stream of data so are now testing with some decent volume. Furthermore it only happens when the consumer is well behind in the topic, therefore trying to catch up as fast as possible.
- When the problem happens the process just crashes and its not possible to debug. The only information available is in the Windows event log and a Windows Error Report is created but it doesn’t look to be much use.
- I can reproduce this with the simple code found in the AdvancedConsumer project Poll example, I just point it at my topic and run from the beginning.
- Once it crashes the first time if I restart the consumer it always crashes within 2 minutes, but its not related to specific messages in the topic as it will easily get past the previous offset it stopped on.
- No logs of interest on the broker side.
- I’ve tried upgrading (just librdkafka) to 0.11.4-RC2, but no change.
I’ve been scratching my head for a day experimenting with different things and the only breakthrough I’ve had is that if I don’t output anything to the Console in the OnMessage event handler, everything works fine (entire topic of 1.6 million messages is consumed). What’s more bizarre is that I do have “debug” set to “all” so there is plenty of Console output from librdkafka! If I have even just Console.WriteLine("*"); in my message handler, it blows up!
The only errors coming out from the debug log are these (regularly seen, and these come out even when everything is working so I don’t think this is the issue):
7|2018-03-28 14:39:45.145|rdkafka#consumer-1|PROTOERR| [thrd:kafkadev03:9092/3]: kafkadev03:9092/3: Protocol parse failure at 1031386/1048631 (rd_kafka_msgset_reader_v2:802) (incorrect broker.version.fallback?)
7|2018-03-28 14:39:45.145|rdkafka#consumer-1|PROTOERR| [thrd:kafkadev03:9092/3]: kafkadev03:9092/3: product_raw [4] MessageSet at offset 396382 payload size 21245 > 17245 remaining bytes
7|2018-03-28 14:39:45.152|rdkafka#consumer-1|PROTOERR| [thrd:kafkadev03:9092/3]: kafkadev03:9092/3: Protocol parse failure at 2094236/2097237 (rd_kafka_msgset_reader_v2:802) (incorrect broker.version.fallback?)
7|2018-03-28 14:39:45.152|rdkafka#consumer-1|PROTOERR| [thrd:kafkadev03:9092/3]: kafkadev03:9092/3: product_raw [1] MessageSet at offset 392741 payload size 27264 > 3001 remaining bytes
7|2018-03-28 14:39:45.167|rdkafka#consumer-1|PROTOERR| [thrd:kafkadev02:9092/2]: kafkadev02:9092/2: Protocol parse failure at 1032453/1048631 (rd_kafka_msgset_reader_v2:802) (incorrect broker.version.fallback?)
7|2018-03-28 14:39:45.168|rdkafka#consumer-1|PROTOERR| [thrd:kafkadev02:9092/2]: kafkadev02:9092/2: product_raw [3] MessageSet at offset 399982 payload size 25974 > 16178 remaining bytes
- Confluent.Kafka nuget version: 0.11.3
- Apache Kafka version: 1.0.0 (confluent docker running in Ubuntu)
- Provide logs (with “debug” : “…” as necessary in configuration)
- Operating system: Windows 10
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 28 (12 by maintainers)
@mhowlett, yes all the console apps were running in windows, including the .NET Core one. Incidentally, my framework console app is using 4.7.1.
I’ve run your code in a 4.6.1 console app and didn’t get the problem. However I have had to wait for over 10 minutes to see the issue previously so I’m going to run a longer test. I’ll also narrow down further by ruling out other differences between your test and mine (one other difference is message size - my messages average 1000 bytes).
We have the same ptoblem on a Windows 10 physical machine with 0.11.4. The problem arises quickly if the consumer is “on late”, while it is harder to reproduce if the consumer has only a few messages to consume. Switching from lz4 to gzip solved the problem