vector: format_timestamp timezone timestamp breaking the basic functionality of s3 sink

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I am using s3 sink for uploading the data for every 5 mins with buffer configuration. Since vector TIMEZONE configuration is default to UTC.

Issue 1: using format_timezone converted the UTC to SGT Timezone and this resulted every min to push the data instead of every 5 mins.

Issue 2: under key_prefix date=%Y%m%d still uses UTC and resulting in the data upload to wrong folder

Screenshot 2023-07-26 at 12 19 33 AM

Configuration

data_dir = "/var/lib/vector"
timezone = "Asia/Singapore"


[api]
  enabled = true

[sources.forward_proxy]
    type = "file"
    include = ["/var/log/squid/access.log"]
    #ignore_older = 3600 # Ignore events older than 1 hour
    read_from = "end"

[transforms.remap_forward_proxy]
    type = "remap"
    inputs = [ "forward_proxy" ]
    source = """
      .message = parse_json!(.message)
      .custom_timestamp = format_timestamp!(now(), format: "%H_%M", timezone: "Asia/Singapore")
    """

[sinks.s3_forward_proxy]
    type = "aws_s3"
    inputs = ["remap_forward_proxy"]
    acl = "bucket-owner-full-control"
    bucket = "xxxxxxx-xxxxxxxxx-xxxxxxx-xxxxxxxx"
    content_encoding = "none"
    content_type = "application/x-gzip"
    filename_time_format = ""
    filename_append_uuid  = false
    filename_extension = "gz"
    key_prefix = "production/forward-proxy_squid.access.log-vector-1/date=%Y%m%d/{{ .custom_timestamp }}_forward-proxy_ap-southeast-1_${HOSTNAME}"
    compression = "gzip"
    region = "ap-southeast-1"


    [sinks.s3_forward_proxy.buffer]
        type = "disk"
        when_full = "block"
        max_size =  5368709760

    [sinks.s3_forward_proxy.batch]
        timeout_secs = 300
        max_bytes = 250000000 #250 mb

   [sinks.s3_forward_proxy.framing]
        method = "newline_delimited"

    [sinks.s3_forward_proxy.encoding]
        codec = "json"
        except_fields = ["custom_timestamp"]

Version

vector 0.31.0 (x86_64-unknown-linux-gnu 0f13b22 2023-07-06 13:52:34.591204470)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

https://github.com/vectordotdev/vrl/pull/247 https://github.com/vectordotdev/vector/issues/14160 https://github.com/vectordotdev/vector/pull/17004

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 6
  • Comments: 23 (10 by maintainers)

Most upvoted comments

Gotcha, yeah, I agree that sinks could take a timezone option that defaults to the globally configured timezone. There is precedent for that sort of override with other options like proxy.

I created two issues to track each of these changes independently:

I think those two issues cover this report so I’ll close it, but let me know if you disagree. You can subscribe to the other issues for updates. We’d also be happy to see PRs addressing either of them 🙂

Buffer is working only if I set filename_time_format. By setting this Timestamp going into UTC. I need the timestamp as SGT.

When I check the code. filename_time_format defaults to UTC. can you make option to use the any Timezone ? If Timezone was not defined, then it can use UTC.

Conclusion: Both the issues are still there.

Buffer getting flushed properly only If set as below config in s3 sink

filename_time_format = "%H_%M_forward-proxy_ap-southeast-1_${HOSTNAME}"

But this is defaulting to UTC because of the below code

https://github.com/vectordotdev/vector/blob/421b421bb988335316417c80129014ff80179246/src/sinks/aws_s3/sink.rs#L79C55-L79C75