aws-for-fluent-bit: Fluentbit S3 output causes prometheus metrics endpoint to be briefly unavailable.

Describe the question/issue

Prometheus metrics endpoint /api/v1/metrics/prometheus is not available immediately with S3 Output and takes a long time for the interface to be available.

Configuration

Note: For credentials, AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY variables are used. These are exported and present in systemd environment when running fluentbit.

[SERVICE]
    flush        10
    daemon       Off
    log_level    info
    parsers_file /etc/flb/parsers.conf
    plugins_file /etc/flb/plugins.conf
    http_server  On
    http_listen  0.0.0.0
    http_port    3281
    storage.metrics On
    storage.path /fluentbit-buffer
    storage.sync normal
    storage.checksum off
    storage.backlog.mem_limit  8M
    storage.max_chunks_up  128

[INPUT]
    Name  tail
    Alias  test_tail
    Path  /var/log/test.log
    Read_from_Head  Off
    Path_Key  filename
    Tag  syslog
    Key  event
    exit_on_eof  false
    Rotate_Wait  5
    Refresh_Interval  60
    Skip_Long_Lines  Off
    DB.sync  normal
    DB.locking  false
    Buffer_Chunk_Size  32k
    Buffer_Max_Size  8M
    Multiline  Off
    Multiline_Flush  4
    Parser_Firstline  8192
    Docker_Mode  Off
    Docker_Mode_Flush  4

[FILTER]
    Name record_modifier
    Match *
    Record hostname ${HOSTNAME}

[OUTPUT]
    Name  s3
    Match  syslog
    endpoint  https://storage.googleapis.com
    bucket. fluentbit-test
    use_put_object true
    content_type  application/gzip
    compression gzip
    store_dir  /fluentbit/s3
    upload_timeout 1m
    region  us-west2
    total_file_size  1M
    s3_key_format  /99com25-test-k8s-s3/$UUID.gz
    s3_key_format_tag_delimiters .-

Fluent Bit Log Output

NA

Fluent Bit Version Info

  • Version used: 1.8.3 on bare metal using systemd unit (Debian stretch).

Steps to reproduce issue

curl the metrics endpoint:

curl  http://localhost:3281/api/v1/metrics/prometheus
curl: (7) Failed to connect to localhost port 3281: Connection refused

Expected behavior

I should be able to curl the endpoint without seeing connection refused error instantaneously. Curl starts working only after a long time. This varies b/w my tests from attempt to attempt - generally 5-10 minutes and also happens to coincide with the first successful upload after buffering 1M of data (according to the settings I use above).

Related Issues

https://github.com/fluent/fluent-bit/issues/4165

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 16 (8 by maintainers)

Most upvoted comments

what memory leak are you talking about?

My bad. I am using a fluent-bit version with a memory leak. The leak has been fixed in newer version but fluent-bit crashes at startup. Thank you for your support, I will stay tuned.