fluent-bit: 404 on prometheus metrics endpoint
Bug Report
Describe the bug One pod from a daemonset is marked as not ready beacuse it fails its readiness probe
To Reproduce Use the readiness probe in a DaemonSet as described in https://github.com/fluent/fluent-bit-kubernetes-logging/blob/1cdfc96be5c265364095d5b3525c5d992a320aa9/output/kafka/fluent-bit-ds.yaml#L30
Expected behavior The readiness probe should not fail
Your Environment Openshift 3.11 Fluentbit 1.3.5
Additional context In my case I have 4 pods running in the daemonset and only one is failing. If I perform a curl to the failing pod it returns a 404 error
fluent-bit-p6trq 0/1 Running 0 22m 192.168.241.112
The api is up
curl http://192.168.241.112:2020/api/v1 -v
* About to connect() to 192.168.241.112 port 2020 (#0)
* Trying 192.168.241.112...
* Connected to 192.168.241.112 (192.168.241.112) port 2020 (#0)
> GET /api/v1 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 192.168.241.112:2020
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: Monkey/1.7.0
< Date: Thu, 16 Jan 2020 11:57:50 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host 192.168.241.112 left intact
{"fluent-bit":{"version":"1.3.5","edition":"Community","flags":["FLB_HAVE_PARSER","FLB_HAVE_RECORD_ACCESSOR","FLB_HAVE_STREAM_PROCESSOR","FLB_HAVE_TLS","FLB_HAVE_SQLDB","FLB_HAVE_METRICS","FLB_HAVE_HTTP_SERVER","FLB_HAVE_SYSTEMD","FLB_HAVE_FORK","FLB_HAVE_TIMESPEC_GET","FLB_HAVE_GMTOFF","FLB_HAVE_UNIX_SOCKET","FLB_HAVE_PROXY_GO","FLB_HAVE_SYSTEM_STRPTIME","FLB_HAVE_JEMALLOC","FLB_HAVE_LIBBACKTRACE","FLB_HAVE_REGEX","FLB_HAVE_LUAJIT","FLB_HAVE_C_TLS","FLB_HAVE_ACCEPT4","FLB_HAVE_INOTIFY"]}}
But the prometheus endpoint does not
curl http://192.168.241.112:2020/api/v1/metrics/prometheus -v
* About to connect() to 192.168.241.112 port 2020 (#0)
* Trying 192.168.241.112...
* Connected to 192.168.241.112 (192.168.241.112) port 2020 (#0)
> GET /api/v1/metrics/prometheus HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 192.168.241.112:2020
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Server: Monkey/1.7.0
< Date: Thu, 16 Jan 2020 11:59:07 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host 192.168.241.112 left intact
set the logging to default but there are no errors:
Fluent Bit v1.3.5
Copyright (C) Treasure Data
[2020/01/16 12:26:23] [debug] [storage] [cio stream] new stream registered: tail.0
[2020/01/16 12:26:23] [debug] [storage] [cio stream] new stream registered: tail.1
[2020/01/16 12:26:23] [debug] [storage] [cio stream] new stream registered: tail.2
[2020/01/16 12:26:23] [debug] [storage] [cio stream] new stream registered: tail.3
[2020/01/16 12:26:23] [ info] [storage] initializing...
[2020/01/16 12:26:23] [ info] [storage] in-memory
[2020/01/16 12:26:23] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/01/16 12:26:23] [ info] [engine] started (pid=1)
[2020/01/16 12:26:23] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/01/16 12:26:23] [debug] [in_tail] inotify watch fd=20
[2020/01/16 12:26:23] [debug] [in_tail] scanning path /var/log/containers/*_ns1_*.log
[2020/01/16 12:26:23] [debug] [in_tail] Cannot read info from: /var/log/containers/*_ns1_*.log
[2020/01/16 12:26:23] [debug] [in_tail] inotify watch fd=26
[2020/01/16 12:26:23] [debug] [in_tail] scanning path /var/log/containers/*_ns2_*.log
[2020/01/16 12:26:23] [debug] [in_tail] Cannot read info from: /var/log/containers/*_ns2_*.log
[2020/01/16 12:26:23] [debug] [in_tail] inotify watch fd=32
[2020/01/16 12:26:23] [debug] [in_tail] scanning path /var/log/containers/*_ns3_*.log
[2020/01/16 12:26:23] [debug] [in_tail] add to scan queue /var/log/containers/rpp-7-r72kw_ns3_rap-c68718b206170c12c1bf76a6ecd336534a39ec90192de00a339f876f7cddd354.log, offset=11141072
[2020/01/16 12:26:23] [debug] [in_tail] inotify watch fd=39
[2020/01/16 12:26:23] [debug] [in_tail] scanning path /var/log/containers/*_orchestrator_*.log
[2020/01/16 12:26:23] [debug] [in_tail] add to scan queue /var/log/containers/api-gw-12-lfk5k_orchestrator_api-gw-54384fc2175d084179bc15684aae6c2ad06ced314969c911cd4978fb2dbdbe2b.log, offset=32675
[2020/01/16 12:26:23] [debug] [upstream_ha] opening file /fluent-bit/etc/..2020_01_16_12_26_16.289066575/upstream.conf
[2020/01/16 12:26:23] [ info] [filter_kube] https=1 host=kubernetes.default.svc port=443
[2020/01/16 12:26:23] [ info] [filter_kube] local POD info OK
[2020/01/16 12:26:23] [ info] [filter_kube] testing connectivity with API server...
[2020/01/16 12:26:23] [debug] [filter_kube] API Server (ns=logging, pod=fluent-bit-srhn7) http_do=0, HTTP Status: 200
[2020/01/16 12:26:23] [ info] [filter_kube] API server connectivity OK
[2020/01/16 12:26:23] [debug] [router] match rule tail.0:forward.0
[2020/01/16 12:26:23] [debug] [router] match rule tail.0:forward.1
[2020/01/16 12:26:23] [debug] [router] match rule tail.1:forward.0
[2020/01/16 12:26:23] [debug] [router] match rule tail.1:forward.1
[2020/01/16 12:26:23] [debug] [router] match rule tail.2:forward.0
[2020/01/16 12:26:23] [debug] [router] match rule tail.2:forward.1
[2020/01/16 12:26:23] [debug] [router] match rule tail.3:forward.0
[2020/01/16 12:26:23] [debug] [router] match rule tail.3:forward.1
[2020/01/16 12:26:23] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2020/01/16 12:26:23] [ info] [sp] stream processor started
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (7 by maintainers)
Also interested in knowing the fix, @edsiper / @rmacian