karapace: schema reader can not replay schema topic
What happened?
We have upgraded karapace from version 3.10.4 to 3.10.5 (and 3.10.6 respectively) but the schema reader hangs while replaying showing a progress of minus 0.39 percent.
karapace.schema_reader schema-reader INFO Replay progress (0.29): -1/254 (-0.39 %)
So the schemas are not replayed and nothing happens.
What did you expect to happen?
I would expect to start at the first schema message in the topic so that all schemas can be rebuilt.
What else do we need to know?
Running with official docker image v3.10.5 and 3.10.6
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 18 (8 by maintainers)
Commits related to this issue
- Handle message errors in schema reader Previously the errors that can surface when polling for messages have not been handled in the schema reader. This commit fixes that and specifically handles and... — committed to Aiven-Open/karapace by deleted user 4 months ago
- Handle message errors in schema reader Previously the errors that can surface when polling for messages have not been handled in the schema reader. This commit fixes that and specifically handles and... — committed to Aiven-Open/karapace by deleted user 4 months ago
@dada-engineer I made a small improvement to the error handling around this bit of the code and extended the README with what we’ve learned here: #832
If this all looks good, I’ll close the issue.
@michael-zucchetta Indeed it seems like the issue is related to the switch to the confluent-kafka-python based consumer. Version 3.10.4 had that reverted to a bug in backups and 3.10.5 reintroduced it after fixing the original issue.
@dada-engineer
That would be a fix for the percentage calculation, yes, but if the replay is stuck on not reading records from the schemas topic, it would only result in
Replay progress (0.29): 0/254 (0.0 %)
in the logs, similarly repeated.The adjustment (or lack of it) of the
beginning_offset
would give a similar result I think.However I can’t reproduce the consumer not reading anything despite being able to fetch the watermark offsets. Could you perhaps provide some info of the records in your schemas topic? Structure, size, maybe a few examples. I’m suspicious of this code skipping messages with
None
keys, which was added alongside of the confluent-kafka-python consumer (although mainly for typing).@matyaskuti LGTM
Thank you very much for your help and your work on karapace 👍🏻
Sure, thanks I did not see a difference in logs on first glance but here you go:
logs.txt
@dada-engineer do you perhaps have any follow-up info on this?