fluent-bit: TLS error: unexpected EOF

Bug Report

Describe the bug Looks like there is issue during recycling multiple TLS connections (when there is only one opened connection to upstream, or no TLS is used, everything works fine), that is causing error in communication between fluent-bit and fluentd. According to captured communication, it looks like from time to time fluent-bit is sending encrypted alert number 21 (probably TLS close notify) during TLS handshake.

To Reproduce With docker could be reproduced by:

fluent-bit.conf

[SERVICE]
  Flush        5
  Grace        5
  Daemon       Off
  Log_Level    info
  Coro_Stack_Size    24576
  HTTP_Server  On
  HTTP_Listen  0.0.0.0
  HTTP_Port    9090
  storage.path  /tmp/fluent-bit-data

[INPUT]
  Name   dummy
  Tag    dummy1.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy2.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy3.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy4.log
  Rate   10

[INPUT]
  Name   dummy
  Tag    dummy5.log
  Rate   10

[OUTPUT]
  Name          forward
  Match         *
  Host          fluentd
  Port          24240
  Workers  1
  tls           On
  tls.verify    Off
  tls.ca_file   /fluent-bit/tls/ca.crt
  tls.crt_file  /fluent-bit/tls/tls.crt
  tls.key_file  /fluent-bit/tls/tls.key
  Empty_Shared_Key true
  net.keepalive on
  net.keepalive_max_recycle 2
  net.dns.mode UDP
  net.dns.resolver LEGACY
  Retry_Limit  False
  storage.total_limit_size  2G

fluentd.conf

<system>
  rpc_endpoint 127.0.0.1:24444
  log_level info
  workers 1
  root_dir /tmp/buffers
</system>

<source>
  @type forward
  @id main_forward
  bind 0.0.0.0
  port 24240
  <transport tls>
    ca_path /fluentd/tls/ca.crt
    cert_path /fluentd/tls/tls.crt
    client_cert_auth true
    private_key_path /fluentd/tls/tls.key
    version TLSv1_2
  </transport>
  <security>
    self_hostname fluentd
    shared_key
  </security>
</source>
<filter **>
  @type stdout
</filter>
<match **>
  @type null
</match>

Commands:

docker network create fluent
docker run --rm --name fluentd --net fluent -v ${PWD}/fluentd:/fluentd/tls fluentd:v1.14.0-1.0 -c /fluentd/tls/fluentd.conf
docker run --rm --net fluent -v ${PWD}/fluent-bit:/fluent-bit/tls fluent/fluent-bit:1.9.7-debug /fluent-bit/bin/fluent-bit -c /fluent-bit/tls/fluent-bit.conf

Error message on fluent-bit side

[2022/10/09 15:49:35] [error] [tls] error: unexpected EOF
[2022/10/09 15:49:35] [error] [output:forward:forward.0] no upstream connections available
[2022/10/09 15:49:35] [ warn] [engine] failed to flush chunk '1-1665330570.669279879.flb', retry in 9 seconds: task_id=4, input=dummy.4 > output=forward.0 (out_id=0)

Error message on fluentd side

2022-10-09 15:49:35 +0000 [warn]: #0 [main_forward] unexpected error before accepting TLS connection by OpenSSL addr="?" host="name resolution failed" port="?" error_class=OpenSSL::SSL::SSLError error="SSL_accept returned=1 errno=104 state=error: invalid alert"

Expected behavior Multiple TLS connections should be recycled correctly without any errors.

Screenshots Screenshot from captured communication (fluent-bit: 172.19.0.3, fluentd: 172.19.0.2) Screenshot 2022-10-09 at 18 48 24

Your Environment Fluent-bit version: 1.9.7 Fluent-bit OpenSSL version: 1.1.1n Fluentd version: 1.14.0 Fluentd OpenSSL version: 1.1.1q

Additional context We would like to have possibility to dynamically scale aggregator part (fluentd) in kubernetes environment along with usage of TLS. Typically we are sending logs from multiple containers collected by fluent-bit (opening multiple upstream connections) to aggregator and in case of aggregator scale out, we would like from fluent-bit to reload new addresses of fluentd pods. To achieve this we tried usage of net.keepalive_max_recycle, but hit issue above.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20

Most upvoted comments

im seeing this error as well. i was using this docker-compose, https://github.com/opensearch-project/data-prepper/blob/main/examples/log-ingestion/fluent-bit.conf

but needed to turn ‘tls on’ for OpenSearch to accept a fluent-bit communication…but now i see the error this issue talks about.

in addition OpenSearch logs this exceptions later - not sure if the two are related:

opensearch | [2023-05-13T00:42:04,313][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [23cfaf6da342] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

Ran into this earlier running Fluent Bit (statically linked) on an Alpine docker image. Turns out I needed to install the ca-certificates package (apk add ca-certificates). Probably similar for other distros if this (or a similar) package is not installed on the system.

For anybody in this thread, just a warning that setting TLS.verify Off should not be considered a solution. That is not really any more secure than not using TLS altogether.

Is it or isn’t it gone @LeoWinterDE? I worked in that layer recently so I could take a look if it’s still a problem.

I can reproduce this bug, output from fluentbit through fluentd by forward.