vector: format_timestamp timezone timestamp breaking the basic functionality of s3 sink
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
I am using s3 sink for uploading the data for every 5 mins with buffer configuration. Since vector TIMEZONE configuration is default to UTC.
Issue 1: using format_timezone converted the UTC to SGT Timezone and this resulted every min to push the data instead of every 5 mins.
Issue 2: under key_prefix date=%Y%m%d still uses UTC and resulting in the data upload to wrong folder
Configuration
data_dir = "/var/lib/vector"
timezone = "Asia/Singapore"
[api]
enabled = true
[sources.forward_proxy]
type = "file"
include = ["/var/log/squid/access.log"]
#ignore_older = 3600 # Ignore events older than 1 hour
read_from = "end"
[transforms.remap_forward_proxy]
type = "remap"
inputs = [ "forward_proxy" ]
source = """
.message = parse_json!(.message)
.custom_timestamp = format_timestamp!(now(), format: "%H_%M", timezone: "Asia/Singapore")
"""
[sinks.s3_forward_proxy]
type = "aws_s3"
inputs = ["remap_forward_proxy"]
acl = "bucket-owner-full-control"
bucket = "xxxxxxx-xxxxxxxxx-xxxxxxx-xxxxxxxx"
content_encoding = "none"
content_type = "application/x-gzip"
filename_time_format = ""
filename_append_uuid = false
filename_extension = "gz"
key_prefix = "production/forward-proxy_squid.access.log-vector-1/date=%Y%m%d/{{ .custom_timestamp }}_forward-proxy_ap-southeast-1_${HOSTNAME}"
compression = "gzip"
region = "ap-southeast-1"
[sinks.s3_forward_proxy.buffer]
type = "disk"
when_full = "block"
max_size = 5368709760
[sinks.s3_forward_proxy.batch]
timeout_secs = 300
max_bytes = 250000000 #250 mb
[sinks.s3_forward_proxy.framing]
method = "newline_delimited"
[sinks.s3_forward_proxy.encoding]
codec = "json"
except_fields = ["custom_timestamp"]
Version
vector 0.31.0 (x86_64-unknown-linux-gnu 0f13b22 2023-07-06 13:52:34.591204470)
Debug Output
No response
Example Data
No response
Additional Context
No response
References
https://github.com/vectordotdev/vrl/pull/247 https://github.com/vectordotdev/vector/issues/14160 https://github.com/vectordotdev/vector/pull/17004
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 6
- Comments: 23 (10 by maintainers)
Gotcha, yeah, I agree that sinks could take a
timezoneoption that defaults to the globally configuredtimezone. There is precedent for that sort of override with other options likeproxy.I created two issues to track each of these changes independently:
I think those two issues cover this report so I’ll close it, but let me know if you disagree. You can subscribe to the other issues for updates. We’d also be happy to see PRs addressing either of them 🙂
Buffer is working only if I set
filename_time_format. By setting this Timestamp going into UTC. I need the timestamp as SGT.When I check the code. filename_time_format defaults to UTC. can you make option to use the any Timezone ? If Timezone was not defined, then it can use UTC.
Conclusion:Both the issues are still there.Buffer getting flushed properly only If set as below config in s3 sink
filename_time_format = "%H_%M_forward-proxy_ap-southeast-1_${HOSTNAME}"But this is defaulting to UTC because of the below code
https://github.com/vectordotdev/vector/blob/421b421bb988335316417c80129014ff80179246/src/sinks/aws_s3/sink.rs#L79C55-L79C75