fluent-bit: kinesis_firehose: Crashing, log loss/duplication
Bug Report
Describe the bug
We are currently doing performance testing, sending a burst of 25,000 logs from Fluent Bit to Kinesis Firehose (via the core kinesis_firehose
plugin), and Fluent Bit seems to be consistently experiencing issues sending this many logs to Firehose, ranging from dropping logs to outright crashing – worryingly, the issues get worse with newer versions of Fluent Bit.
Specifically:
- v
1.8.0+
: Crashes within 20 seconds (segmentation fault); loses logs (only manages to send a fraction of the logs before crashing) - v
1.7.6+
: Doesn’t crash, but log delivery is inconsistent – sometimes loses logs, sometimes sends more logs (i.e. sends the same log multiple times, presumably caused by Fluent Bit’s retry attempts) - v
1.7.5
: Doesn’t crash and doesn’t lose logs, but seems to always send more/duplicate logs (See below for more details).
Note that, if we switch to Amazon’s Fluent Bit image (and use Amazon’s firehose
plugin instead of the core kinesis_firehose
plugin), all these issues go away. Specifically:
- Fluent Bit doesn’t crash
- Fluent Bit doesn’t lose logs
- Fluent Bit doesn’t send more/duplicate logs Instead, it always sends the exact number of logs that were generated.
So, the issue seems to be with the core kinesis_firehose
plugin specifically.
To Reproduce Our testing is being done on a large Ubuntu EC2 instance. Fluent Bit is present on that EC2, and sends logs to a Kinesis Firehose delivery stream in the same AWS account. To avoid proxy issues, we have created a VPC endpoint for Firehose, so that we can directly send logs from EC2 to Firehose.
- Fluent Bit installation: For simplicity, to let us quickly try out different versions of Fluent Bit, we do not directly install Fluent Bit on the EC2 – instead, we run the core Fluent Bit Docker image.
- Fluent Bit config: Reads logs from a local file and sends them to Firehose.
[INPUT]
Name tail
Path /data/perf-test/logFolder-fb/*.log
refresh_interval 2
rotate_wait 5
db /data/perf-test/fluentbit-logs.db
db.sync normal
db.locking true
buffer_chunk_size 128KB
buffer_max_size 50MB
skip_long_lines on
mem_buf_limit 199800KB
[FILTER]
Name nest
Match *
Operation nest
Wildcard *
Nest_under event
[OUTPUT]
name kinesis_firehose
match *
region us-east-1
delivery_stream Test-Firehose
- Run Fluent Bit: We run the Fluent Bit Docker image, and mount the required files, using the following command:
docker run --rm \
--mount type=bind,source=/data/perf-test/fluent-bit.conf,destination=/fluent-bit/etc/fluent-bit.conf,readonly \
--mount type=bind,source=/data/perf-test/,destination=/data/perf-test/ \
fluent/fluent-bit:1.8.3-debug
- Test data:
We have a file called
/data/perf-test/fakeData.txt
containing fake data/logs, where each log is ~1,000 bytes in size. Eg.:
Katherine Stone sphillips@shields.net 760-16-6504 2006-05-09T07:04:56 1974-01-19T07:04:25 Griffinborough Clinical biochemist wuPmcdIIIlPKQddacCTDMHedOKrxhgUOTyVDUjmExZqqwRmGKSwakHiKDMTlQzSvmnNmSgsJkJmtpHDBkICOrKNiNGJYftCIgNuQopxZZMXxGGXLUyCNyuWhzCKUCuuXKhxotmyulQExiufWfjQdiDwbRDUctByhAZcJPrlbGlInbpYcwCAQeJJBOZOEnKlsAOqYtNAueXfAeXFzEtssxZUTVIFTjjlspeJiBggwYuAtwlXzSScLcQNkkFUtCpGZhdPVrpiyNlmdcKkqpIjQIVjRmnnKBvzOSSPvXHLzhOeRzApvmaJtJIkYYLMhftbLioTbnWpXGIzkMzWRVimRrCJkRaqtttLcOiaOekPQgYdrByRPIZMMwfTKdftfZPHKIHJeYryUljCVolxZFdYDihpRHHFJlEwvfViRouHYZPcUihkbnVSkQLGlGzLPHLodovHJjrVqifkdNssyspCGpGHFNSeLguqpWIxMWhJMLkLDizCtqAOzDccveDvowyLrRlEECVGjqrkFTIIIOntAEXgnheqVqLnJaFBBTWcdKdlhzeixnfRZgmorTXKxeHaDDGZAWPhIpiRArxXQkArJdBjCtGNuOoDdNBMgGTLKbbEnsXWUiuEZXELXpbOfJuBSnuaAtUOlTafuSgmoPgMUjBYeaIOplfSzqRNKLnZBzZrzoAmxnfvSosZbDefkeabOAvtrXNFHYdORBJyuzjtpURVmTRzMKjwJhImztWkGcFYEqRpEjfeWfLelRaUNTiLPINZhEfcZagnLPcZrjPAATOVLOiSpwooU F Engineer 3 105000 500
Barbara Owens mcmahondana@watkins-ortega.biz 017-15-1913 2008-07-19T14:00:16 1971-11-12T07:39:36 Lake Katherine Pilot, airline rlKxQDLcXioeDMvYUaizSaSaooGlZqxofWVQaSmgambARvYwiRJGnkEvjzLUUYXJYdfXbfsAfzKYfalLwtwbqHXzzecamaoYaaUeREAZMwIVUropudeFzAgCVWSDozegeQLvWfHxFZkSLNuKGIDzdZhdsuPAGwKLdipbtAruGNtiEKrZAtFNvEvKblMMlhjxDHlqMmbkpycFwbzjILiTPtXJyuDwPbxgZJhREMcXIFzbefpGRDcXoKhxopuSpzvxYEsPwpPwATqchsIDCKoAwuasioVRoQGDtzhQGdpoepIVvLtFAefIGEbezLHwWCWVhMQqudeQuIFybUNKmpPsekTlBaomPhOPKjiYDtDICSYegUavSaWAseqQOrRsjSCEHeBiKcVnsbsncQBvlJLvyAJRWLltDJqDbNQnDmUSIeLKUnFJanNCeYGkrAvNGxixvMLFkbFUOyMgDsDOOFmmtCaWwaEUxJxYQxojGWcjbJVgAMoEdnwJIIJCXgOcrZUBMTglJblUEgHfeNRlAtSZbWiAWEzdJSqVdIcVpkPYeifNcVGDjzHlAXVjwmgjiNHIeZaYZZteUpMEaGVzwcUCSXfZwUFOhyvkCjkaRkMCFUJzlqBbdJcwwTDFSmeqduzaqzcBGZNrkuZVUQcNKFiMWRGaUlwZXVMSDLzzuXimTeolCKnRpujnKSDyfFkUOKVeXUeamySmKqnwLLTHIVetBWrqdyHhuhSbnbvNDCchfsmRMsdpSrSqPHdpCoRFPNTKrOJZCrvOyWNIXNlbFbSALQRatImfnAIt M VP 4 114000 4500
Victoria Rivas carrolldonna@jones-hawkins.com 593-88-0695 1993-05-08T06:22:07 1979-09-14T03:01:22 Lake Lance Teaching laboratory technician sNaVptJsxvviMPJOOZrVcQffnxFqkSaQullfJWapxyMOOntudKgOHGuvCdJYtcmLbhxfMxUfXXtPdRshcnEyLuoAJdtVtTTCixHBskHCpypMvtkOcuUcIcjSRQBRmBsojeHIohvaVzWkdiSwpovDIFAlAUrvpNtHvzGhlTfbwenITMTuvloyWKcaaAWhAsCtDyKrAOSZOrIlrJjwTwKMIuGuNeQXyvSxiUnXfbGjhRQvBBfrgwjYgLJZZLdWgSjVnwllbXvsqtVSUZulzpmZnmETxnIUwoLOUAdeqdiNJmedjqGOnDNGIZwjezZaRbOXHtcjFPvRqXFMNdhrNnYRgabKzhgiooKhLCNBcGqWTdpxqhGqpSfCDJkADHDgcJGxLldqNDmZjngadcoCrASBVgZJnVaSUAExVguyKSbRzWYjJNpBoIkkPIVkBJxlOVciayQakEawwYJXNKHJZRKGzrYdDEMaHAoUDXTabEcvTSWlyoKFJVJanMvCpjLySPTevaHxSCIRRgnlAxixFZiHJEzWSWEhUISKdMUiEJtOTypDdebUnQQNcBJuIzaorYxkyeWypHpDXbwDzQfmIGpMthHZJtMuqNyogwFCMihpJIyAbubWyILAXUKSZmwxvJraZSgUQZhtJwJdJUuYBJEelBApIFOryFeftifowvfeOgEwNmOJFTQnMzTezlVrjFigZnoyrVofGGriUzwmaNhSrTUAmCEYfGygmZxarJlvnjUDNQEwkzVLsePHkBJzmdvtsGhQJBnCxMBJLJqFmtlTwiYTXdZcKqLW M Sales 17 90000 500
...
- Test: To actually run the performance test, we run the following script (
/data/perf-test/runtest.sh
), which essentially reads a certain number of logs per second from/data/perf-test/fakeData.txt
and writes them to/data/perf-test/logFolder-fb/test.log
, from where Fluent Bit tails them and sends them to Firehose:
#!/bin/bash
LOOPCNT=$1
LOGCOUNT=$2
i=0;
sleep 5
while [ $i -lt $LOOPCNT ];
do
head -$LOGCOUNT /data/perf-test/fakeData.txt;
let i=i+1;
sleep 1;
done;
sleep 5
Thus, to have the above script generate 25,000 logs (5,000 logs/second * 5 seconds) for Fluent Bit to read, we run the following command:
/data/perf-test/runtest.sh 5 5000 > /data/perf-test/logFolder-fb/test.log
Expected behavior Since we generated 25,000 logs to a file being tailed by Fluent Bit, we expect Fluent Bit to send exactly 25,000 logs to Firehose. Instead, as mentioned above, depending on which version of (core) Fluent Bit we use, it either crashes, loses logs or sends more/duplicate logs.
If we switch to Amazon’s Fluent Bit image and Amazon’s firehose
plugin (i.e. replace name kinesis_firehose
in the above Fluent Bit config with name firehose
), then all the issues go away and Fluent Bit behaves as expected – it sends exactly 25,000 logs to Firehose.
Error Logs Here are the logs generated by Fluent Bit, for some of the versions we tested:
1.8.3
: Crashes
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2021/08/04 19:12:23] [ info] [engine] started (pid=1)
[2021/08/04 19:12:23] [ info] [storage] version=1.1.1, initializing...
[2021/08/04 19:12:23] [ info] [storage] in-memory
[2021/08/04 19:12:23] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/04 19:12:23] [ info] [cmetrics] version=0.1.6
[2021/08/04 19:12:23] [ info] [sp] stream processor started
[2021/08/04 19:13:52] [ info] [input:tail:tail.0] inotify_fs_add(): inode=112 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/08/04 19:13:59] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:13:59] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:00] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:07] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:07] [error] [src/flb_http_client.c:1170 errno=11] Resource temporarily unavailable
[2021/08/04 19:14:07] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/04 19:14:07] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/04 19:14:07] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/04 19:14:07] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:08] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:08] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:09] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:09] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:10] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:10] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/04 19:14:10] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/04 19:14:10] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/04 19:14:10] [ warn] [engine] failed to flush chunk '1-1628104441.528346298.flb', retry in 8 seconds: task_id=10, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/04 19:14:10] [error] [net] socket #38 could not connect to firehose.us-east-1.amazonaws.com:443
[2021/08/04 19:14:10] [engine] caught signal (SIGSEGV)
[2021/08/04 19:14:10] [ Error] epoll_ctl: Bad file descriptor, errno=9 at /tmp/fluent-bit/lib/monkey/mk_core/mk_event_epoll.c:136
#0 0x55c4d91db242 in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:88
#1 0x55c4d91db26d in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2 0x55c4d91dbd1f in prepare_destroy_conn() at src/flb_upstream.c:390
#3 0x55c4d91dbd81 in prepare_destroy_conn_safe() at src/flb_upstream.c:412
#4 0x55c4d91dc057 in create_conn() at src/flb_upstream.c:501
#5 0x55c4d91dc4b9 in flb_upstream_conn_get() at src/flb_upstream.c:640
#6 0x55c4d92d0cf2 in request_do() at src/aws/flb_aws_util.c:285
#7 0x55c4d92d0905 in flb_aws_client_request() at src/aws/flb_aws_util.c:161
#8 0x55c4d9291d3e in put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
#9 0x55c4d9290702 in send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
#10 0x55c4d9290e31 in process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:560
#11 0x55c4d928f396 in cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
#12 0x55c4d91c60de in output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#13 0x55c4d96b3066 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
1.8.2
: Crashes
...
[2021/07/30 18:50:33] [ info] [input:tail:tail.0] inotify_fs_add(): inode=112 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/07/30 18:50:34] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/07/30 18:50:42] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:43] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:43] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:44] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:44] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:44] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:52] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/07/30 18:50:52] [error] [aws_client] connection initialization error
[2021/07/30 18:50:52] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 18:50:52] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 18:50:52] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 18:50:52] [ warn] [engine] failed to flush chunk '1-1627671037.751893694.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 18:50:54] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/07/30 18:50:54] [error] [aws_client] connection initialization error
[2021/07/30 18:50:54] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 18:50:54] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 18:50:54] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 18:50:54] [ warn] [engine] failed to flush chunk '1-1627671038.789867966.flb', retry in 9 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 18:50:54] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/07/30 18:50:54] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/07/30 18:50:54] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/07/30 18:50:54] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/07/30 18:50:54] [engine] caught signal (SIGSEGV)
#0 0x5631e07a4070 in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:87
#1 0x5631e07a40a7 in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2 0x5631e07a4b59 in prepare_destroy_conn() at src/flb_upstream.c:390
#3 0x5631e07a4bbb in prepare_destroy_conn_safe() at src/flb_upstream.c:412
#4 0x5631e07a4e91 in create_conn() at src/flb_upstream.c:501
#5 0x5631e07a52f3 in flb_upstream_conn_get() at src/flb_upstream.c:640
#6 0x5631e089255d in request_do() at src/aws/flb_aws_util.c:284
#7 0x5631e0892170 in flb_aws_client_request() at src/aws/flb_aws_util.c:160
#8 0x5631e085804c in put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
#9 0x5631e0856a10 in send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
#10 0x5631e0856d6c in add_event() at plugins/out_kinesis_firehose/firehose_api.c:451
#11 0x5631e08570e2 in process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:551
#12 0x5631e08556a4 in cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
#13 0x5631e078f00a in output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#14 0x5631e0c5f546 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#15 0xffffffffffffffff in ???() at ???:0
1.7.6
: Does not crash, but log delivery is inconsistent: Sometimes loses logs, sometimes sends extra/duplicate logs
Fluent Bit v1.7.6
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2021/08/05 17:17:00] [ info] [engine] started (pid=1)
[2021/08/05 17:17:00] [ info] [storage] version=1.1.1, initializing...
[2021/08/05 17:17:00] [ info] [storage] in-memory
[2021/08/05 17:17:00] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/05 17:17:00] [ info] [sp] stream processor started
[2021/08/05 17:27:43] [ info] [input:tail:tail.0] inotify_fs_add(): inode=112 watch_fd=2 name=/data/perf-test/logFolder-fb/test.log
[2021/08/05 17:27:44] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184468.277267041.flb', retry in 8 seconds: task_id=2, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184469.283845807.flb', retry in 6 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184468.265588851.flb', retry in 10 seconds: task_id=1, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184468.255456489.flb', retry in 11 seconds: task_id=0, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1455 records, sent 1455 to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184470.309479199.flb', retry in 7 seconds: task_id=5, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184471.324442944.flb', retry in 7 seconds: task_id=7, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184472.351715342.flb', retry in 10 seconds: task_id=10, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184471.333923382.flb', retry in 6 seconds: task_id=8, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184472.342344688.flb', retry in 9 seconds: task_id=9, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184470.298943708.flb', retry in 11 seconds: task_id=4, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2121 records, sent 2121 to Test-Firehose
[2021/08/05 17:27:56] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
[2021/08/05 17:27:58] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2111 records, sent 2111 to Test-Firehose
[2021/08/05 17:27:59] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/08/05 17:28:00] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:28:02] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:28:02] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:28:02] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:28:02] [ warn] [engine] chunk '1-1628184471.324442944.flb' cannot be retried: task_id=7, input=tail.0 > output=kinesis_firehose.0
[2021/08/05 17:28:02] [ warn] [engine] chunk '1-1628184470.309479199.flb' cannot be retried: task_id=5, input=tail.0 > output=kinesis_firehose.0
[2021/08/05 17:28:02] [ warn] [engine] chunk '1-1628184468.255456489.flb' cannot be retried: task_id=0, input=tail.0 > output=kinesis_firehose.0
[2021/08/05 17:28:02] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2191 records, sent 2191 to Test-Firehose
[2021/08/05 17:28:04] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
[2021/08/05 17:28:05] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2079 records, sent 2079 to Test-Firehose
[2021/08/05 17:28:07] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2151 records, sent 2151 to Test-Firehose
1.7.5
: Does not crash, does not lose logs, but does send extra/duplicate logs
...
[2021/07/30 21:08:23] [ info] [input:tail:tail.0] inotify_fs_add(): inode=106 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/07/30 21:08:25] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679309.426969060.flb', retry in 7 seconds: task_id=5, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 426 records, sent 426 to Test-Firehose
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679308.411100874.flb', retry in 7 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679310.447924521.flb', retry in 9 seconds: task_id=7, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679307.380111794.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679310.455127141.flb', retry in 6 seconds: task_id=8, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2189 records, sent 2189 to Test-Firehose
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2181 records, sent 2181 to Test-Firehose
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2202 records, sent 2202 to Test-Firehose
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/07/30 21:08:36] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 696 records, sent 696 to Test-Firehose
[2021/07/30 21:08:36] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
[2021/07/30 21:08:36] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2151 records, sent 2151 to Test-Firehose
[2021/07/30 21:08:37] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2154 records, sent 2154 to Test-Firehose
[2021/07/30 21:08:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2238 records, sent 2238 to Test-Firehose
[2021/07/30 21:08:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2154 records, sent 2154 to Test-Firehose
[2021/07/30 21:08:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/07/30 21:08:40] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/07/30 21:08:40] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
- Amazon Fluent Bit v
2.19.0
(containing core Fluent Bit v1.8.3
) with Amazonfirehose
plugin: Works as expected – no crashing, no log loss, no extra/duplicate logs
AWS for Fluent Bit Container Image Version 2.19.0
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2021/08/04 19:35:46] [ info] [engine] started (pid=1)
[2021/08/04 19:35:46] [ info] [storage] version=1.1.1, initializing...
[2021/08/04 19:35:46] [ info] [storage] in-memory
[2021/08/04 19:35:46] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/04 19:35:46] [ info] [cmetrics] version=0.1.6
time="2021-08-04T19:35:46Z" level=info msg="A new higher performance Firehose plugin has been released; you are using the old plugin. Check out the new plugin's documentation and consider migrating.\nhttps://docs.fluentbit.io/manual/pipeline/outputs/firehose"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter delivery_stream = 'Test-Firehose'"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter region = 'us-east-1'"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter data_keys = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter role_arn = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter endpoint = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter sts_endpoint = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter time_key = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter time_key_format = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter log_key = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter replace_dots = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter simple_aggregation = 'false'"
[2021/08/04 19:35:46] [ info] [sp] stream processor started
[2021/08/04 19:36:03] [ info] [input:tail:tail.0] inotify_fs_add(): inode=106 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
Your Environment
- Versions: Core Fluent Bit Docker image versions tested:
1.8.3-debug
1.8.2-debug
1.8.0-debug
1.7.9-debug
1.7.8-debug
1.7.7-debug
1.7.6-debug
1.7.5-debug
Amazon Fluent Bit Docker image versions tested:2.19.0
(contains Fluent Bit v1.8.3
)
- Configuration: See above
- Operating System and version:
> uname -a
Linux ip-10-249-29-83 5.4.0-1051-aws #53~18.04.1-Ubuntu SMP Fri Jun 18 14:53:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 18 (11 by maintainers)
Cool. I will close it. (The stale label will be ignored because we commented).
Note: Similar backtrace(v1.7.9): #3866 #3687
On v1.7.9, network back end was updated. https://fluentbit.io/announcements/v1.7.9/
network: make async dns query use TCP socket instead of UDP
DNS back end was changed from v1.7.6 https://fluentbit.io/announcements/v1.7.6/
network: new asynchronous DNS support