fluent-bit: S3 Output Compression not working

Bug Report

Describe the bug Using td-agent-bit version 1.7.8 with the S3 output, the compression setting seems to be ignored, even when using use_put_object true

To Reproduce Here is my configuration of the output s3 block.

[OUTPUT]
    name s3
    match *
    region us-east-2
    bucket my-bucket-name
    s3_key_format /fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S/$UUID.gz
    use_put_object On
    total_file_size 40M
    upload_timeout 1m
    compression gzip

Regardless if compression setting is missing (inferring none) or present with gzip, the uploaded files are always cleartext / uncompressed.

Expected behavior Logs uploaded would be compressed with gzip before upload.

Your Environment

  • Version used: 1.7.8
  • Configuration: (See above)
  • Environment name and version (e.g. Kubernetes? What version?): RPM install
  • Server type and version: AWS t3a instance
  • Operating System and version: Centos 8, fully patched as of 2021-06-23
  • Filters and plugins: none

I can find nothing in the error logs about a failed compression. Every upload, I get a ‘happy’ message: Successfully uploaded object. However, the file is still cleartext. I saw references in @PettitWesley thread in #2700 that this was working, so I am unsure if this is a regression or something else.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 31 (13 by maintainers)

Most upvoted comments

This is occurred by the attribute Content-Encoding: gzip tagged with the log.gz file fluent-bit have uploaded.

image

When you download the file tagged as Content-Encoding: gzip, user agent (e.g. Chrome, curl) will automatically decode the content as same as downloading gzipped stream on HTTP since Content-Encoding: gzip header has been appended to the response header. Yes, it’s obviously been compressed on S3.

An easy solution is to just remove off .gz extension from s3_key_format.

There seems to be no way to turn off Content-Encoding: gzip.

$ aws --profile 1234567890 s3 cp s3://mybucket/path/to/file/ItWLhdDe.log.gz ~/Downloads/ItWLhdDe.log.gz
$ ls -la ~/Downloads/ItWLhdDe.*
-rw-r--r--  1 paul  staff  2554 Mar 15 17:00 /Users/paul/Downloads/ItWLhdDe.log.gz
$ gunzip ~/Downloads/ItWLhdDe.log.gz
$ ls -la ~/Downloads/ItWLhdDe.*     
-rw-r--r--  1 paul  staff  22264 Mar 15 17:00 /Users/paul/Downloads/ItWLhdDe.log

WHY CHROME, WHY!?

I don’t think this is working. I have a similar configuration to the ones reported before:

        [OUTPUT]
            Name s3
            Match *
            bucket mybucket
            region ap-southeast-2
            store_dir /home/ec2-user/buffer
            s3_key_format /fluentbit/$TAG[2]/$TAG[0]/%Y/%m/%d/%H/%M/%S/$UUID.gz
            s3_key_format_tag_delimiters .-
            compression gzip
            use_put_object On
            total_file_size 50M

And I got my files in S3. I.E. s3://mybucket/fluentbit/log/kube/2022/02/23/01/35/48/15BRQR03.gz

Then I select the file in S3 and in the object actions I select “Query with S3 Select” Screen Shot 2022-02-23 at 4 05 42 pm

So in the S3 select I configure like: Screen Shot 2022-02-23 at 4 08 23 pm

Screen Shot 2022-02-23 at 4 08 53 pm

Now you will notice I select one JSON per line and gzip compression as it is the expected output, however it returns an error that says GZIP is not applicable.

However, if I change the compression to None, I get a proper response on the same query: Screen Shot 2022-02-23 at 4 14 28 pm

While I got a Mac, these queries are being run inside AWS and the files won’t touch my laptop so I can say with a level of certainty the files are not being gzipped.