fluent-bit: Bad escaping after fluent-bit processing

Hello,

After writing on slack, will post here the issue i’m facing. I’ve been using fluent-bit in several times, but this time i found out that the logs created by an application with some \n are being changed by fluent-bit adding one more "" this is leading to a lot of bad escaping on kibana, and debugging stacktraces with it is very difficult. I’ve attached the 2 differents logs (application raw logs, and logs parsed by fluent-bit), and the SOS report. If you can take a look and give some advices would be much appreciated. Thanks,

From application : {"@timestamp":"2017-12-11T17:22:07.421+00:00","@version":1,"message":"\nFailed events housekeeping completed in 4ms\n","logger_name":"LoadingProcessManager.[LoadingProcessManager]","thread_name":"[FR(234141)]@Wrk.80","level":"INFO","level_value":20000,"user":"adm","code":"G1000","instance":"product","layer":"converter"}
From Kibana: 
{
  "_index": "dev-logs-2017.12.11",
  "_type": "flb_type",
  "_id": "AWBGmTGR9USO52Xb23Br",
  "_version": 1,
  "_score": null,
  "_source": {
    "_@timestamp": "2017-12-11T17:22:07.424Z",
    "@timestamp": "2017-12-11T17:22:07.421+00:00",
    "@version": 1,
    "message": "\\nFailed events housekeeping completed in 4ms\\n",
    "logger_name": "LoadingProcessManager.[LoadingProcessManager]",
    "thread_name": "[FR(234141)]@Wrk.80",
    "level": "INFO",
    "level_value": 20000,
    "user": "adm",
    "code": "G1000",
    "instance": "product",
    "layer": "converter"
  },
  "fields": {
    "@timestamp": [
      1513012927421
    ]
  },
  "sort": [
    1513012927421
  ]
}

SOS Report

Customers of Treasure Data. For more details visit:                                                                                                                                                     [0/25816]

    https://fluentd.treasuredata.com


[Fluent Bit]
    Edition             Community Edition
    Version             0.12.9
    Built Flags          JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_SQLDB FLB_HAVE_BUFFERING FLB_HAVE_TRACE FLB_HAVE_FLUSH_LIBCO FLB_HAVE_SYSTEMD FLB_HAVE_FORK FLB_HAVE_PROXY_GO FLB_HAVE_JEMALLOC JEMALLOC_MANGLE FLB_HAVE_REGEX FLB_HAVE_C_TLS FLB_HAVE_SETJMP FLB_HAVE_ACCEPT4 FLB_HAVE_INOTIFY

[Operating System]
    Name                Linux
    Release             4.4.0-1043-aws
    Version             #52-Ubuntu SMP Tue Dec 5 10:49:06 UTC 2017

[Hardware]
    Architecture        x86_64
    Processors          2

[Built Plugins]
    Inputs              cpu mem kmsg tail proc disk systemd netif dummy head health serial stdin tcp mqtt lib forward random syslog
    Filters             grep stdout kubernetes parser record_modifier
    Outputs             counter es exit file forward http influxdb kafka-rest nats null plot stdout td lib flowcounter

[SERVER] Runtime configuration
    Flush               1
    Daemon              Off
    Log_Level           Debug

[INPUT] Instance
    Name                tail.0 (tail, id=0)
    Flags               DYN_TAG
    Threaded            No
    Tag                 tail.0
    Mem_Buf_Limit       9.5M
    Path                /var/log/file.log
    Parser              json-test
    DB                  /var/log/flb_kube.db
    Buffer_Chunk_Size   8k
    Buffer_Max_Size     32k


[OUTPUT] Instance
    Name                es.0 (es, mask_id=1)
    Match               *
    TLS Active          No
    Retry Limit         no limit
    Host.TCP_Port       80
    Host.Name           elastic.host
    Logstash_Format     On
    Logstash_Prefix     dev-logs
    Index               dev-logs
    Time_Key            _@timestamp

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 4
  • Comments: 27 (11 by maintainers)

Commits related to this issue

Most upvoted comments

thanks everyone for reporting the problems associated to escaping.

@hdiass provided me some information that was helpful to identify the issue associated to one posted at the beginning:

Issue

When an application generate a formal JSON log, likely special ASCII characters like ‘\n’ (note: ASCII character, one byte) is converted to JSON string format which is “\n” (two bytes: \ and n). So our JSON parser applied to the records is not converting back the string “\n” to the ASCII byte ‘\n’, for hence when an output plugin is going to convert to JSON output, it will re-escape the original “\n” to \\n".

Ideas

I think the solution to the problem is to extend our JSON parser where we could instruct to “decode” specific fields of the record; ideally this must be optional because as you know text parsing is expensive and I would like to avoid performance penalties for scenarios where logs with special characters does not exists.

A parser could look like this:

[PARSER]
    Name             docker_parser
    Format           json
    Time_Key         time
    Time_Format      %Y-%m-%dT%H:%M:%S.%L
    Time_Keep        On
    Decode_JSON_Field   message

As an example consider the following original record from a file:

{"@timestamp":"2018-01-19T11:38:06.225+00:00",
 "@version":1,
 "message":"\nFailed events housekeeping completed in 1ms\n",
}

in the example above it contains a stringify text in message key. Actually the \n is kept as it is and if it hits an output plugin that re-format to JSON (like elasticsearch), it will escape that character.

I’ve also found that if a JSON string comes with encoded UTF-8, these are not decoded and kept as they come, e.g:

{"log":"JP characters \u6628\u591c\u306e\u30b3\u30f3\u30b5"}

after processing to stdout we get:

[1516385197.832135298, {"log"=>"JP characters \u6628\u591c\u306e\u30b3\u30f3\u30b5"}]

we expected the Japanese characters 昨夜のコンサ

so a “decoding” feature at Parser level is required, either for special characters like \n and others in UTF-8.

Ok, I’m experiencing something similar. I’m still debugging, I’ll let you know if I figure anything out.

My fallback is to ditch this lightweight agent, and go back to the full fluentd agent (which doesnt have this problem).