fluent-bit: Storage backlog chunk validation failure on restart in 2.1.X (data loss)
Bug Report
Fluent Bit restarts result in storage backlog chunk validation failures in version 2.1.X.
To Reproduce
Here is a docker compose project that can be used to show the expected behavior in version 2.0.11 and the unexpected behavior in version 2.1.2:
https://github.com/amorey/flb-backlog-bug
- Example log message:
[2023/04/27 06:11:37] [ info] [input:storage_backlog:storage_backlog.1] register tcp.0/1-1682575884.551412366.flb
[2023/04/27 06:11:37] [error] [input:storage_backlog:storage_backlog.1] chunk validation failed, data might be corrupted. No valid records found, the chunk will be discarded.
[2023/04/27 06:11:37] [error] [input:storage_backlog:storage_backlog.1] removing chunk tcp.0:1-1682575884.551412366.flb from the queue
- Steps to reproduce the problem:
- Send a message to Fluent Bit that results in a flush failure and creates a new pending task
- Restart Fluent Bit
Expected behavior
On restart, previously pending tasks should be added to the storage backlog queue:
[2023/04/27 06:10:54] [ info] [input:storage_backlog:storage_backlog.1] register tcp.0/1-1682575828.625496799.flb
[2023/04/27 06:10:54] [ info] [input:storage_backlog:storage_backlog.1] queueing tcp.0:1-1682575828.625496799.flb
Screenshots
See https://github.com/amorey/flb-backlog-bug for log snippets.
Your Environment
- Version used: 2.1.2
- Configuration: https://raw.githubusercontent.com/amorey/flb-backlog-bug/main/config/fluent-bit.conf
- Environment name and version (e.g. Kubernetes? What version?): Docker Desktop 4.18.0
- Server type and version:
- Operating System and version:
- Filters and plugins:
Additional context
This bug will result in data loss on system restarts if tasks are pending.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 2
- Comments: 18
Thank you very much for taking the time to share your results. I don’t know the exact ETA for the release but I think it will be sooner rather than later, hopefully within this week.
I’ll send an update as soon as I have a proper ETA.
Hi @anosulchik, there is a PR for this issue that’s about to be merged and I think there will be a release early this week.