fluent-bit: Duplicate @timestamp fields in elasticsearch output
I am trying to replace my fluentd installation in kubernetes with fluent-bit 0.13.3 but ran into an issue. We currently have the standard setup:
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_JSON_Log On
[OUTPUT]
Name es
Match *
Host HOST
Port 9200
Logstash_Format On
Retry_Limit False
Type flb_type
The problem is that some of the log messages from services are json encoded and also include a @timestamp field. This then causes some errors:
[2018/06/11 15:22:49] [ warn] [out_es] Elasticsearch error
{"took":78,"errors":true,"items":[{"index":{"_index":"logstash_test-2018.06.11","_type":"flb_type","_id":"ZPhx72MBChql05IASc5e","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":142,"_primary_term":1,"status":201}},{"index":{"_index":"logstash_test-2018.06.11","_type":"flb_type","_id":"Zfhx72MBChql05IASc5e","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field '@timestamp'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@6dda49d4; line: 1, column: 509]"}}}},{"index":{"_index":"logstash_test-2018.06.11","_type":"flb_type","_id":"Zvhx72MBChql05IASc5e","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"i_o_exception","reason":"Duplicate field '@timestamp'\n at [Source: org.elasticsearch.common.bytes.BytesReferenc
I tried to use Merge_JSON_Key to mitigate this, but the option seems to be disabled in the source code (without mentioning it in the docs, took me some time to figure out why it does not work š). In my opinion the Merge_JSON_Log should overwrite existing keys instead of having duplicate keys.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 9
- Comments: 24 (3 by maintainers)
Commits related to this issue
- Fixing fluent-bit bug (https://github.com/fluent/fluent-bit/issues/628) — committed to ruzickap/k8s-istio-demo by ruzickap 5 years ago
@edsiper I would reopen this,
I have this config:
and keep getting error like this one:
Having to rename it is patchwork, so, think about providing better defaults instead.
Isnāt it more convenient to change the behavior of Elasticsearch plugin such that, it wonāt append the @timestamp key (or Time_Key in general) if it already exists?
This needs to be reopened.
Consider that the official Logstash formatter for log4j, https://github.com/logstash/log4j-jsonevent-layout is going to output the @timestamp to standard out. So a developer, encountering this bug, is going to wonder why their Logstash-format JSON is considered invalid by something that describes itself as Logstash compatible.
i think this need to reopened
JSON with keys the same name is not invalid. But make sense itās a restriction for elasticsearch, your workarounds:
Merge_Log_Keyin your Kubernetes filter, so your unpacked data will be under that new key, avoiding duplicates if any:https://docs.fluentbit.io/manual/filter/kubernetes
Time_Key:https://docs.fluentbit.io/manual/output/elasticsearch
Now If I implement a kind of āsanitizerā option, take in count it will affect performance. Options above should work, if donāt please let me know.
š late to the party, but here we are š
for those using the helm chart, the below works with chart version 1.0.0
as you can see
@timestampis still present, but different issue.Hi Minhnhat, By default, the whole log message will also be added to the index as a āLogā field so itās not a problem. You can disable it via:
Take a look at my yml, maybe that will help:
I also tried to describe full setup process in by blog
@lxfontes this only solves the timestamp problem for me but I still have āduplicate ātimeā fieldsā. Why do we actually need this?
I used
@zerkms thanks for the feedback! Nice catch with systemd section, so it works by coincidence because systemd is defaulted to false š It would definitely be better to have it as a param, hope your PR will get merged soon. š¤
Hi @Vfialkin, itās really helpful. Thank you !
Ok so for me the fix was for the kubernetes chart setting:
As now it will not longer try to merge keys. Guess this is a bug in fluentd as the expected behavior should be that while doing the merge it MUST NOT try to append the field.
I have the same issue. kibana for example produces logs containing @timestamp fields. my own applications i was able to fix by renaming the timestamp field.