fluent-bit: Fluent Bit stuck when lost connection with the internet
Bug Report
Describe the bug
I want to test that whether Fluent Bit will buffer logs in local file system in case that Fluent Bit fails to flush data to output destination. This could be caused by unstable internet connection, power down and so on… The first case I tested is no internet connection. Therefore I just created an offline environment by turning off Wifi on my local machine.
However, Fluent Bit struct when lost connection with the internet, and still not respond (Or after a long time… it will respond) after turning internet on.
To Reproduce I have a docker containing running on my local computer and generating logs to a specific file. The code for the log-generator is referenced from here.
Steps:
- Run the docker container
- Start Fluent Bit
- Once Fluent Bit starts to work, turn Wifi off for ~30s
- After 30s, turn Wifi on and Fluent Bit will stuck
And the logs of Fluent Bit:
Fluent Bit v1.5.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/07/17 14:29:21] [ info] Configuration:
[2020/07/17 14:29:21] [ info] flush time | 1.000000 seconds
[2020/07/17 14:29:21] [ info] grace | 5 seconds
[2020/07/17 14:29:21] [ info] daemon | 0
[2020/07/17 14:29:21] [ info] ___________
[2020/07/17 14:29:21] [ info] inputs:
[2020/07/17 14:29:21] [ info] tail
[2020/07/17 14:29:21] [ info] ___________
[2020/07/17 14:29:21] [ info] filters:
[2020/07/17 14:29:21] [ info] record_modifier.0
[2020/07/17 14:29:21] [ info] ___________
[2020/07/17 14:29:21] [ info] outputs:
[2020/07/17 14:29:21] [ info] stackdriver.0
[2020/07/17 14:29:21] [ info] ___________
[2020/07/17 14:29:21] [ info] collectors:
[2020/07/17 14:29:21] [ info] [engine] started (pid=31137)
[2020/07/17 14:29:21] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/07/17 14:29:21] [debug] [storage] [cio scan] opening path /var/log/fluent-bit-buffers/
[2020/07/17 14:29:21] [debug] [storage] [cio stream] new stream registered: tail.0
[2020/07/17 14:29:21] [debug] [storage] [cio scan] opening stream tail.0
[2020/07/17 14:29:21] [debug] [storage] tail.0:30392-1595010555.961076422.flb mapped OK
[2020/07/17 14:29:21] [debug] [storage] [cio stream] new stream registered: tail.0
[2020/07/17 14:29:21] [debug] [storage] [cio stream] new stream registered: storage_backlog.1
[2020/07/17 14:29:21] [ info] [storage] version=1.0.4, initializing...
[2020/07/17 14:29:21] [ info] [storage] root path '/var/log/fluent-bit-buffers/'
[2020/07/17 14:29:21] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/07/17 14:29:21] [ info] [storage] backlog input plugin: storage_backlog.1
[2020/07/17 14:29:21] [debug] [input:tail:tail.0] inotify watch fd=20
[2020/07/17 14:29:21] [debug] [input:tail:tail.0] scanning path /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log
[2020/07/17 14:29:21] [debug] [input:tail:tail.0] inode=3192531 appended as /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log
[2020/07/17 14:29:21] [debug] [input:tail:tail.0] scan_glob add(): /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log, inode 3192531
[2020/07/17 14:29:21] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 9.5M
[2020/07/17 14:29:21] [ info] [input:storage_backlog:storage_backlog.1] register tail.0/30392-1595010555.961076422.flb
[2020/07/17 14:29:21] [debug] [output:stackdriver:stackdriver.0] JWT signature:
eyJhbGciOiAiUlMyNTYiLCAidHlwIjogIkpXVCJ9.eyJpc3MiOiAiamVmZmx1b28tZ2tlLWRldi1sb2dnaW5nQGplZmZsdW9vLWdrZS1kZXYuaWFtLmdzZXJ2aWNlYWNjb3VudC5jb20iLCAic2NvcGUiOiAiaHR0cHM6Ly93d3cuZ29vZ2xlYXBpcy5jb20vYXV0aC9sb2dnaW5nLndyaXRlIiwgImF1ZCI6ICJodHRwczovL3d3dy5nb29nbGVhcGlzLmNvbS9vYXV0aDIvdjQvdG9rZW4iLCAiZXhwIjogMTU5NTAxMzU2MSwgImlhdCI6IDE1OTUwMTA1NjF9.LsoMQodIitU4_yHvF59_nqZ-8AVyUbyq6XU2Z8db245UVb8OMuCrqAn8kajiMWtYgrVlFBqrm8FAl-MFRtyL4A3BSxI9n77vaeyqRHdc6MNK2JbzTmXHv_pF3Mo8IUXMOK_o6yTJgoJaIom6XHAKuWzdP1Ke5sii7IsEShI2nSD8PIE4HGXJOXwDyOFwaTgGEh877QsfEhhPhoSDy45aE0P_Di-EuFRoaBd98mpwCWL55MoEh5gD3MickW5DlbZEW9p36qfjLszHd4Vg-IRmNzAyv1nuXmNlUrdtlHcQFy4Q1txW0wddRCzEw_Rgn5ArDNReK3-57EZfWWuUKcHAeA
[2020/07/17 14:29:21] [ info] [oauth2] HTTP Status=200
[2020/07/17 14:29:21] [debug] [oauth2] payload:
{"access_token":"ya29.c.Ko8B1AeKv0M_TT3RoumJIRw_KYseXvznb97eGhsP9nkFPevcYVjSRgUkDR1Dt8F6VjtXhCuIc4p1IxE-zemeHlfaUX_SE9_ixP95fZrYs8RPowELCPYhEDXPMnaAK7a9gNzAPiFGOC1PNXwFmD-I6iLF0WwJbgv62ewzcMC4wjRpkGODpUl1JMsCggt46QUGc38","expires_in":3599,"token_type":"Bearer"}
[2020/07/17 14:29:21] [ info] [oauth2] access token from 'www.googleapis.com:443' retrieved
[2020/07/17 14:29:21] [debug] [router] match rule tail.0:stackdriver.0
[2020/07/17 14:29:21] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2020/07/17 14:29:21] [ info] [sp] stream processor started
[2020/07/17 14:29:21] [debug] [storage] tail.0:31137-1595010561.575441987.flb mapped OK
[2020/07/17 14:29:21] [debug] [storage] [cio file] alloc_size from 4096 to 233472
[2020/07/17 14:29:21] [debug] [storage] [cio file] alloc_size from 233472 to 331776
[2020/07/17 14:29:21] [debug] [input:tail:tail.0] inode=3192531 file=/var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log promote to TAIL_EVENT
[2020/07/17 14:29:21] [ info] inotify_fs_add(): inode=3192531 watch_fd=1 name=/var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log
[2020/07/17 14:29:22] [debug] [task] created task=0x5cafad1150e0 id=0 OK
[2020/07/17 14:29:22] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:29:22] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:29:22] [debug] [storage] tail.0:30392-1595010555.961076422.flb mapped OK
[2020/07/17 14:29:22] [ info] [input:storage_backlog:storage_backlog.1] queueing tail.0:30392-1595010555.961076422.flb
[2020/07/17 14:29:22] [debug] [task] destroy task=0x5cafad1150e0 (task_id=0)
[2020/07/17 14:29:22] [debug] [storage] [cio file] synced at: tail.0/31137-1595010561.575441987.flb
[2020/07/17 14:29:23] [debug] [task] created task=0x5cafad17fa40 id=0 OK
[2020/07/17 14:29:23] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
[2020/07/17 14:29:23] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:29:23] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:29:23] [ info] [engine] flush backlog chunk '30392-1595010555.961076422.flb' succeeded: task_id=0, input=storage_backlog.1 > output=stackdriver.0
[2020/07/17 14:29:23] [debug] [task] destroy task=0x5cafad17fa40 (task_id=0)
[2020/07/17 14:29:23] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:29:23] [debug] [storage] tail.0:31137-1595010563.326386283.flb mapped OK
[2020/07/17 14:29:24] [debug] [task] created task=0x5cafad180a60 id=0 OK
[2020/07/17 14:29:24] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
[2020/07/17 14:29:24] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:29:24] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:29:24] [debug] [task] destroy task=0x5cafad180a60 (task_id=0)
[2020/07/17 14:29:24] [debug] [storage] [cio file] synced at: tail.0/31137-1595010563.326386283.flb
[2020/07/17 14:29:25] [error] [src/flb_scheduler.c:48 errno=0] Success
[2020/07/17 14:29:25] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:29:25] [debug] [storage] tail.0:31137-1595010565.652132014.flb mapped OK
[2020/07/17 14:29:26] [debug] [task] created task=0x5cafad17fc60 id=0 OK
[2020/07/17 14:29:26] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
[2020/07/17 14:29:26] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:29:26] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:29:26] [debug] [task] destroy task=0x5cafad17fc60 (task_id=0)
[2020/07/17 14:29:26] [debug] [storage] [cio file] synced at: tail.0/31137-1595010565.652132014.flb
[2020/07/17 14:29:28] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:29:28] [debug] [storage] tail.0:31137-1595010568.43277302.flb mapped OK
[2020/07/17 14:29:28] [debug] [task] created task=0x5cafad17fc60 id=0 OK
[2020/07/17 14:29:28] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
[2020/07/17 14:29:28] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:29:28] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:29:28] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:29:28] [debug] [storage] tail.0:31137-1595010568.577112829.flb mapped OK
[2020/07/17 14:29:28] [debug] [task] destroy task=0x5cafad17fc60 (task_id=0)
[2020/07/17 14:29:28] [debug] [storage] [cio file] synced at: tail.0/31137-1595010568.43277302.flb
[2020/07/17 14:29:29] [debug] [task] created task=0x5cafad17fc60 id=0 OK
[2020/07/17 14:29:29] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)**
[2020/07/17 14:34:15] [error] [tls] SSL error: NET - Connection was reset by peer
[2020/07/17 14:34:15] [error] [http_client] broken connection to logging.googleapis.com:443 ?
[2020/07/17 14:34:15] [ warn] [output:stackdriver:stackdriver.0] http_do=-1
[2020/07/17 14:34:15] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:34:15] [debug] [input:tail:tail.0] scanning path /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log
[2020/07/17 14:34:15] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log, inode 3192531
[2020/07/17 14:34:15] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:34:15] [debug] [storage] tail.0:31137-1595010855.384208524.flb mapped OK
[2020/07/17 14:34:15] [debug] [storage] [cio file] alloc_size from 4096 to 36864
[2020/07/17 14:34:15] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been disconnected by the remote service
[2020/07/17 14:34:15] [debug] [retry] new retry created for task_id=0 attemps=1
[2020/07/17 14:34:15] [ warn] [engine] failed to flush chunk '31137-1595010568.577112829.flb', retry in 11 seconds: task_id=0, input=tail.0 > output=stackdriver.0
[2020/07/17 14:34:16] [debug] [task] created task=0x5cafad17f800 id=1 OK
[2020/07/17 14:34:16] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:34:16] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:34:16] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:34:16] [debug] [storage] tail.0:31137-1595010856.429208524.flb mapped OK
[2020/07/17 14:34:16] [debug] [task] destroy task=0x5cafad17f800 (task_id=1)
[2020/07/17 14:34:16] [debug] [storage] [cio file] synced at: tail.0/31137-1595010855.384208524.flb
[2020/07/17 14:34:17] [debug] [task] created task=0x5cafad17f800 id=1 OK
[2020/07/17 14:34:17] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
[2020/07/17 14:34:17] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:34:17] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:34:17] [debug] [task] destroy task=0x5cafad17f800 (task_id=1)
[2020/07/17 14:34:17] [debug] [storage] [cio file] synced at: tail.0/31137-1595010856.429208524.flb
[2020/07/17 14:34:18] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:34:18] [debug] [storage] tail.0:31137-1595010858.396208524.flb mapped OK
[2020/07/17 14:34:19] [debug] [task] created task=0x5cafad17f800 id=1 OK
[2020/07/17 14:34:19] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
[2020/07/17 14:34:19] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:34:19] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:34:19] [debug] [task] destroy task=0x5cafad17f800 (task_id=1)
[2020/07/17 14:34:19] [debug] [storage] [cio file] synced at: tail.0/31137-1595010858.396208524.flb
[2020/07/17 14:34:20] [debug] [input:tail:tail.0] inode=3192531 events: IN_MODIFY
[2020/07/17 14:34:20] [debug] [storage] tail.0:31137-1595010860.324208524.flb mapped OK
[2020/07/17 14:34:21] [debug] [task] created task=0x5cafad17f4a0 id=1 OK
[2020/07/17 14:34:21] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
[2020/07/17 14:34:21] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200
[2020/07/17 14:34:21] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 is now available
[2020/07/17 14:34:21] [debug] [input:tail:tail.0] scanning path /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log
[2020/07/17 14:34:21] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log, inode 3192531
[2020/07/17 14:34:21] [debug] [task] destroy task=0x5cafad17f4a0 (task_id=1)
[2020/07/17 14:34:21] [debug] [storage] [cio file] synced at: tail.0/31137-1595010860.324208524.flb
[2020/07/17 14:34:22] [error] [src/flb_scheduler.c:48 errno=0] Success
^C[engine] caught signal (SIGINT)
[2020/07/17 14:34:23] [ info] [input] pausing tail.0
[2020/07/17 14:34:23] [ info] [input] pausing storage_backlog.1
[2020/07/17 14:34:23] [debug] [input:tail:tail.0] inode=3192531 removing file name /var/lib/docker/containers/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf/edbace40bae66c75883ea9a41bb1cb1f561a75262abc19d7f28c7c8034e4f7cf-json.log
[2020/07/17 14:34:23] [ info] inotify_fs_remove(): inode=3192531 watch_fd=1
[2020/07/17 14:34:23] [debug] [task] destroy task=0x5cafad17fc60 (task_id=0)
[2020/07/17 14:34:23] [debug] [storage] [cio file] synced at: tail.0/31137-1595010568.577112829.flb
[2020/07/17 14:34:23] [debug] [retry] task retry=0x5cafad04ea90, invalidated from the scheduler
After turning off Wifi, the last log generated by Fluent Bit is
[2020/07/17 14:29:29] [debug] [upstream] KA connection #41 to logging.googleapis.com:443 has been assigned (recycled)
and then after turning on Wifi Fluent Bit will stuck on this state for unknown time… I have tried this many times and sometimes Fluent Bit will response in ~17 minutes and sometimes ~5 mins. The next few lines of logs since it stuck are:
[2020/07/17 14:34:15] [error] [tls] SSL error: NET - Connection was reset by peer
[2020/07/17 14:34:15] [error] [http_client] broken connection to logging.googleapis.com:443 ?
- Steps to reproduce the problem:
Expected behavior
I expect Fluent Bit to respond when the internet connection is on and no logs are lost during the offline period.
Screenshots
Your Environment
- Version used: Fluent Bit v1.5
- Configuration: The configuration file I was using for testing is:
[SERVICE]
Flush 1
Daemon Off
Log_Level debug
Parsers_File myparser.conf
storage.path /var/log/fluent-bit-buffers/
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 10M
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
[INPUT]
Name tail
Tag k8s_container.test1.test2.offline
Parser JSON
Path PATH_TO_THE_LOG_FILE
Buffer_Chunk_Size 512KB
Buffer_Max_Size 5M
Rotate_Wait 30
Mem_Buf_Limit 5M
Skip_Long_Lines On
Refresh_Interval 10
storage.type filesystem
Ignore_Older 4h
[OUTPUT]
Name stackdriver
Match k8s_container.*
Resource k8s_container
k8s_cluster_name test_cluster_name
k8s_cluster_location test_cluster_location
google_service_credentials /home/jeffluoo/fluentbit/gke-config.json
Retry_Limit False
tls On
tls.verify Off
net.connect_timeout 30
net.keepalive_idle_timeout 30
[FILTER]
Name record_modifier
Match k8s_container.*
Record logging.googleapis.com/local_resource_id k8s_container.test1.test2.testfast
- Environment name and version (e.g. Kubernetes? What version?):
- Server type and version:
- Operating System and version:
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
- Filters and plugins: Tail + Record_Modifier + Stackdriver
Additional context
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (24 by maintainers)
Hi @edsiper Yes I think this ticket can be closed as the issue was resolved.