fluent-bit: Specifying Splunk metadata in raw event broken in v1.7.5+

Bug Report

Describe the bug v1.7.5 broke the Splunk output plugin, and it remains broken as of v1.8.2: Specifically, it is no longer possible to customize the ‘reserved’ Splunk metadata (eg. index, source, etc.) by configuring splunk_send_raw on as described here and specifying the Splunk metadata in the top-level of the record – it is treated as if splunk_send_raw was set to off (i.e. the Splunk metadata in the top-level of the record is treated as part of the actual log).

To Reproduce Steps to reproduce the problem:

  • Create a Fluent Bit config that uses the splunk_send_raw on feature to inject Splunk metadata into the record: This can be any config similar to the sample one given here. Eg.:
[INPUT]
  ...

# Nest the entire log that exists at this point under the 'event' key
[FILTER]
  name                nest
  match               *
  operation           nest
  wildcard            *
  nest_under          event

# Add Splunk metadata to the top level of the record
[FILTER]
  name                modify
  match               *
  add                 index my-index
  add                 source my-source

[OUTPUT]
  name                splunk
  match               *
  host                http-inputs-....com
  port                443
  tls                 on
  tls.verify          on
  splunk_token        ...
  # Enable raw mode
  splunk_send_raw     on
  • Set the Fluent Bit version < 1.7.5 – see that, in Splunk, the record has been processed as expected – i.e. the log has been extracted from the event key and the Splunk metadata has been set to the values specified in our Splunk metadata keys.
  • Change the Fluent Bit version to >= 1.7.5 – see that the record in Splunk has NOT been processed as expected – the event key and our Splunk metadata keys are treated as part of the log itself (i.e. as if splunk_send_raw was set to off).

Expected behavior See above.

Screenshots (Unnecessary info redacted)

The below screenshots were produced by the exact same Fluent Bit configuration – the only change (which broke the log processing in Splunk) was changing the Fluent Bit version. Note that splunk_send_raw is set to on.

  • 1.7.4: Splunk successfully extracts the log from the event key; Splunk metadata is set to the values we specified in the Splunk metadata keys good

  • 1.7.5+: Log is not extracted from the event key – Splunk thinks the event key and top-level Splunk metadata keys are all part of the log; Splunk metadata is not set to the values we specified in the Splunk metadata keys. This is what we would expect to see if splunk_send_raw was set to off, even though in this case it is still set to on. bad

Your Environment

  • Fluent Bit Container Version: 1.7.5-debug
  • Configuration: (See above).
  • Filters and plugins: Splunk output plugin

Additional context

  • If we want to dynamically specify Splunk index, source, sourcetype, etc. for each log using the ‘raw data’ approach as recommended in Fluent Bit docs here, our config only works for older versions of Fluent Bit (< 1.7.5) – as soon as we upgrade to 1.7.5+, it breaks.
  • See here for the breaking commit.
  • See here for what I believe to be the cause of the issue, as well as some suggested solutions.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 32 (8 by maintainers)

Most upvoted comments

Just an additional followup for folks struggling with this. I have my Splunk HEC behind a traefik proxy, which returns 404 if all HEC instances are down, which in turn causes Fluent Bit to drop messages (since this is a deviation of the return code behavior from the HEC itself, where a 4XX is an unrecoverable error). I decided to try using the Fluent Bit HTTP output instead of Splunk (using the same event crafting filters I was using with Splunk Raw) and that works fine for me, even with recent versions of Fluent Bit. Here’s my output config if anyone’s interested:

[OUTPUT]
    name          http
    log_level     warning
    match         *
    tls           on
    host          <HOST>
    port          443
    uri           /services/collector/event
    header        Authorization Splunk <TOKEN>
    format        json_stream
    json_date_key time
    retry_limit   false

Contents

  • Problem
  • Solution
  • Better Solution
  • Best Solution

Problem

To recap, prior to v1.7.5, Fluent Bit allowed you to enable ‘raw mode’ and specify Splunk metadata (eg. index, source) in the record sent to Splunk. For v1.7.5+, that functionality no longer works (Splunk no longer recognizes the Splunk metadata – it thinks it’s just part of the log). The easiest way to explain what Fluent Bit was doing earlier, and why it no longer works now, is via the equivalent curl commands:

  1. What Fluent Bit was doing prior to v1.7.5:
# Correct way to specify Splunk metadata for EVENT data
curl \
    -k https://http-inputs-....com/services/collector/event \
    -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" \
    -d '{"index":"my-index", "source":"my-source", "event":"My log"}'

i.e. As Niksko pointed out above, even though we ostensibly enabled raw mode by setting splunk_send_raw to on, internally, Fluent Bit was still sending the record to the event endpoint. This worked anyway because the Splunk metadata was also specified the way it’s meant to be for event data, so the event endpoint was able to correctly process the record – i.e. it understood that the actual log was confined to the event key and correctly interpreted the index and source keys as Splunk metadata.

  1. What Fluent Bit is doing now, for v1.7.5+:
# WRONG way to specify Splunk metadata for raw data!
curl \
    -k https://http-inputs-....com/services/collector/raw \
    -H "X-Splunk-Request-Channel: ${RANDOM_GUID}" \
    -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" \
    -d '{"index":"my-index", "source":"my-source", "event":"My log"}'

I accidentally did this myself and immediately noticed that the resulting log looked identical to what Fluent Bit v1.7.5+ produces. Thus, after skimming Fluent Bit source code, I believe this is essentially what it is doing internally: The fix that was introduced in v1.7.5 was to ensure that, when raw mode is enabled (splunk_send_raw set to on), the record is sent to the actual raw endpoint. Unfortunately, as you can see, this is only a partial fix because the Splunk metadata is still specified the same way – i.e. the way it’s meant to be specified for event data! Thus, this does not work because the raw endpoint is not able to correctly process the record – i.e it thinks the entire record (not just the event key) is the actual log, and does not interpret the index and source keys as Splunk metadata.

Solution

What Fluent Bit SHOULD have been doing, for v1.7.5+:

# Correct way to specify Splunk metadata for raw data
curl \
    -k "https://http-inputs-....com/services/collector/raw?index=my-index&source=my-source" \
    -H "X-Splunk-Request-Channel: ${RANDOM_GUID}" \
    -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" \
    -d 'My log'

As you can see, we are now sending the record to the raw endpoint AND we are specifying the Splunk metadata the way it’s meant to be specified for raw data – i.e. as query string parameters in the raw endpoint URL. Also, there is no longer a JSON wrapper around the actual log – it is sent ‘raw’. If implemented in Fluent Bit, this would work as now the raw endpoint would be capable of correctly processing the record – i.e. it would correctly interpret the entire record (in this case, My log) as the actual log, and it would interpret the query string parameters as Splunk metadata.

Better Solution

Part of the confusion, at least for me, was that Fluent Bit’s documentation for the Splunk plugin wrongly conflates ‘specifying Splunk metadata’ with ‘sending logs as raw data’:

If you would like to customize any of the Splunk event metadata, such as the host or target index, you can set Splunk_Send_Raw On in the plugin configuration, and add the metadata as keys/values in the record.

For people who are not that familiar with Splunk (like myself), you might assume that this is a restriction on the Splunk side – i.e. even if you are not using Fluent Bit, if you want to specify Splunk metadata for your log, Splunk requires you to send the log as raw data. However, if you actually read Splunk’s documentation (link1, link2), you’ll see that this is not true – as my curl examples above show, you can specify Splunk metadata when sending logs as raw data AND when sending logs as (JSON) event data. Thus, the restriction is on the Fluent Bit side, not the Splunk side.

So, when I realized that the first (raw) approach was broken in v1.7.5+, I immediately assumed I had no other option to specify Splunk metadata. And, prior to v1.8.0, that was true because of Fluent Bit’s restriction. In v1.8.0 though, several new parameters for the Splunk plugin were introduced, such as event_index and event_source – although it wasn’t clear to me at the time, these new parameters give us an alternative (event) approach, which does NOT require us to enable raw mode (and, thus, bypasses this bug).

i.e. Approach 1 (raw mode): (Currently broken!)

[INPUT]
  ...

# Nest the entire log that exists at this point under the 'event' key
[FILTER]
  name                nest
  match               *
  operation           nest
  wildcard            *
  nest_under          event

# Add Splunk metadata to the top level of the record
[FILTER]
  name                modify
  match               *
  add                 index my-index
  add                 source my-source

[OUTPUT]
  name                splunk
  match               *
  host                http-inputs-....com
  port                443
  tls                 on
  tls.verify          on
  splunk_token        ...
  # Enable raw mode
  splunk_send_raw     on

is EQUIVALENT to

Approach 2 (event mode):

[INPUT]
  ...

# Do not need to manually nest the log under the 'event' key

# Do not need to manually inject the Splunk metadata into the top level of the record

[OUTPUT]
  name                splunk
  match               *
  host                http-inputs-....com
  port                443
  tls                 on
  tls.verify          on
  splunk_token        ...
  # DISABLE raw mode
  splunk_send_raw     off
  # Use new params
  event_index         my-index
  event_source        my-source

Thus, this ‘event’ approach provides a viable alternative to the currently broken ‘raw’ approach – in fact, given that, when Fluent Bit sends a log to Splunk, it always sends it in a well-formatted JSON wrapper, it seems more logical (and I believe results in faster processing on the Splunk side) if we send it to the (JSON) event endpoint rather than the raw endpoint.

Best Solution

As matthewjstanford pointed out though, this new ‘event’ approach does have one downside, which is that we lose the flexibility that we had when setting the Splunk metadata values: In the original ‘raw’ approach, we had to manually inject the Splunk metadata values into the record, which meant that we had full control over what Splunk metadata values ultimately got populated – in this new ‘event’ approach however, we are limited to whatever the new event_... parameters happen to support.

Eg. There are two new parameters to help us specify sourcetype:

  • event_sourcetype: Only accepts a hardcoded string value
  • event_sourcetype_key: Accepts a log key (via record accessor syntax) – if that key is present in the log, its corresponding value takes precedence over event_sourcetype.

Thus, using the ‘event’ approach, you can configure Fluent Bit to do the following:

  1. Look for sourcetype in an optional log key
  2. If that optional log key is not present, fall back to a hardcoded default value.

What you CANNOT do with the ‘event’ approach though is configure Fluent Bit to do the following:

  1. Look for sourcetype in an optional log key
  2. If that optional log key is not present, fall back to a different, mandatory log key.

In the ‘raw’ approach though, we could have implemented this functionality via a Lua filter.

Thus, the ideal solution would be a mix of both approaches: The ability to send the log to the event endpoint, and the ability to fully control what Splunk metadata values get injected into the record – i.e. what we originally had prior to v1.7.5, even if we didn’t know it at the time.

This issue was closed because it has been stalled for 5 days with no activity.

Thanks for the update @brian-maloney , I appreciate everyones input in this and if folks are interested perhaps we could setup some time to quickly run through options and move forward with a decision. @pranavmarla @Niksko

One option I feel makes more sense after seeing @brian-maloney using HTTP output plugin makes me feel we should route users who want to use Splunk_Send_Raw to use the HTTP output plugins with guidance on endpoint URI’s such as /raw or /event`. The only difference I am seeing between the plugin vs. HTTP output plugin setup for a user is the URI / Header, and Format fields being required.

Thanks @brian-maloney! I believe that works because the Splunk output plugin is internally sending logs to the /raw endpoint (even though the log metadata is specified in a way that can only be interpreted correctly by the /event endpoint), whereas you are manually sending the logs to the /event endpoint where it gets interpreted as expected.

Thanks for the write up @pranavmarla , we need more time to review this

Looking at the source again, unless I’m mistaken, prior to 1.7.5, all events were sent to the /services/collector/event endpoint, with the difference between raw and non-raw events being how they were packed. Now, raw events go to /services/collector/raw. On paper this seems like it should work, based on reading the Splunk API docs. However it also worked previously, so there’s clearly some information that’s missing here.

My C++ skills aren’t great, but I notice that in a few places, there is the following comparison:

if (ctx->splunk_send_raw == FLB_TRUE)

However on this line, the comparison instead is

if (ctx->splunk_send_raw)

Is it possible that this always evaluates to false, so data is always sent to the events endpoint?

@pranavmarla yeah, there is a URI endpoint change in v1.8 plus other new options for metadata set.

I encourage you to build an image from v1.8 (git master), that might clarify all doubts (my apologies but we are pretty busy preparing the next big release).

I just hit the exact same issue as you. Thanks for your detailed report, I haven’t been able to figure out that the splunk_send_raw option is the problem. I just saw [upstream] connection #49 failed to <host>:<port> in the fluentbit log.