aws-for-fluent-bit: Fluentbit S3 output causes prometheus metrics endpoint to be briefly unavailable.
Describe the question/issue
Prometheus metrics endpoint /api/v1/metrics/prometheus is not available immediately with S3 Output and takes a long time for the interface to be available.
Configuration
Note: For credentials, AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY variables are used. These are exported and present in systemd environment when running fluentbit.
[SERVICE]
flush 10
daemon Off
log_level info
parsers_file /etc/flb/parsers.conf
plugins_file /etc/flb/plugins.conf
http_server On
http_listen 0.0.0.0
http_port 3281
storage.metrics On
storage.path /fluentbit-buffer
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 8M
storage.max_chunks_up 128
[INPUT]
Name tail
Alias test_tail
Path /var/log/test.log
Read_from_Head Off
Path_Key filename
Tag syslog
Key event
exit_on_eof false
Rotate_Wait 5
Refresh_Interval 60
Skip_Long_Lines Off
DB.sync normal
DB.locking false
Buffer_Chunk_Size 32k
Buffer_Max_Size 8M
Multiline Off
Multiline_Flush 4
Parser_Firstline 8192
Docker_Mode Off
Docker_Mode_Flush 4
[FILTER]
Name record_modifier
Match *
Record hostname ${HOSTNAME}
[OUTPUT]
Name s3
Match syslog
endpoint https://storage.googleapis.com
bucket. fluentbit-test
use_put_object true
content_type application/gzip
compression gzip
store_dir /fluentbit/s3
upload_timeout 1m
region us-west2
total_file_size 1M
s3_key_format /99com25-test-k8s-s3/$UUID.gz
s3_key_format_tag_delimiters .-
Fluent Bit Log Output
NA
Fluent Bit Version Info
- Version used: 1.8.3 on bare metal using systemd unit (Debian stretch).
Steps to reproduce issue
curl the metrics endpoint:
curl http://localhost:3281/api/v1/metrics/prometheus
curl: (7) Failed to connect to localhost port 3281: Connection refused
Expected behavior
I should be able to curl the endpoint without seeing connection refused error instantaneously. Curl starts working only after a long time. This varies b/w my tests from attempt to attempt - generally 5-10 minutes and also happens to coincide with the first successful upload after buffering 1M of data (according to the settings I use above).
Related Issues
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 16 (8 by maintainers)
My bad. I am using a fluent-bit version with a memory leak. The leak has been fixed in newer version but fluent-bit crashes at startup. Thank you for your support, I will stay tuned.