fluent-bit: S3 output intermittently fails with errors SignatureDoesNotMatch, broken pipe, or HTTP version error
Bug Report
Describe the bug
I have Fluent Bit deployed in a kubernetes cluster sending a large volume of logs to an S3 bucket. Most logs are transmitted successfully, however Fluent Bit regularly logs the “PutObject request failed” error. (This is odd because use_put_object
is set to false
) Fluent Bit logs the HTTP 403 response it got from S3, which has this error text: “The request signature we calculated does not match the signature you provided. Check your key and signing method.”
There are also a lot of “broken pipe” errors although it is unclear if they are related.
Even more bizarrely, there are intermittent HTTP 505 Version not supported errors. I have no idea how these could be intermittent, surely the HTTP version is always the same?
To Reproduce
- Rubular link if applicable: N/A
- Example log message if applicable:
[2023/02/28 23:16:52] [error] [output:s3:s3.1] PutObject request failed
[2023/02/28 23:16:52] [error] [output:s3:s3.1] PutObject API responded with error='SignatureDoesNotMatch', message='The request signature we calculated does not match the signature you provided. Check your key and signing method.'
[2023/02/28 23:16:52] [error] [/src/fluent-bit/src/flb_http_client.c:1201 errno=32] Broken pipe
[2023/02/28 23:31:05] [error] [output:s3:s3.1] PutObject API responded with error='HttpVersionNotSupported', message='The HTTP version specified is not supported.'
[2023/02/28 23:31:05] [error] [output:s3:s3.1] Raw PutObject response: HTTP/1.1 505 HTTP Version not supported
- Steps to reproduce the problem: I’m afraid I don’t have a concise set of steps to reproduce, our environment is fairly large and complex and this issue only seems to appear under heavy load.
Expected behavior Fluent Bit sends log files to S3 bucket
Screenshots N/A
Your Environment
- Version used: 2.0.9
- Configuration:
[OUTPUT]
Name s3
Match *
bucket ${log_bucket_name}
region ${log_bucket_region}
total_file_size 10M
s3_key_format /${cluster_name}/$TAG/%Y/%m/%d/%H/%M/%S
use_put_object false
- Environment name and version (e.g. Kubernetes? What version?): AWS EKS cluster, Kubernetes 1.22
- Server type and version: EC2 instances of various types
- Operating System and version: Bottlerocket 1.12.0
- Filters and plugins: tail input, modify filter, kubernetes filter
Additional context I know that Fluent Bit retries requests to S3 but I am seeing occasional messages like this:
[2023/02/28 23:30:56] [ warn] [output:s3:s3.1] Chunk file failed to send 5 times, will not retry
So I am concerned that I am losing log messages.
I realize that these could be 3 different issues however they seem to occur together and I’m wondering if they could have a common cause.
Incidentally, I found this comment (over a year old) which reports the same behavior: https://github.com/fluent/fluent-bit/issues/4505#issuecomment-1000376903
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 25 (9 by maintainers)
Could it be that when a chunk is added to the upload queue by add_to_queue a raw copy of the tag is made in line 1584 with a buffer that’s insufficient to hold the terminator?
I will investigate if this report is related: https://github.com/aws/aws-for-fluent-bit/issues/541